Consider a simple C program:
#include <stdio.h>
int main()
{
puts("Hello");
return 0;
}
Running it with GDB, having set LD_BIND_NOW=1 for simplicity, I can observe the following:
$ gdb -q ./test -ex 'b main' -ex r
Reading symbols from ./test...done.
Breakpoint 1 at 0x8048420
Starting program: /tmp/test
Breakpoint 1, 0x08048420 in main ()
(gdb) disas
Dump of assembler code for function main:
0x0804841d <+0>: push ebp
0x0804841e <+1>: mov ebp,esp
=> 0x08048420 <+3>: and esp,0xfffffff0
0x08048423 <+6>: sub esp,0x10
0x08048426 <+9>: mov DWORD PTR [esp],0x8048500
0x0804842d <+16>: call 0x80482c0 <puts#plt>
0x08048432 <+21>: mov eax,0x0
0x08048437 <+26>: leave
0x08048438 <+27>: ret
End of assembler dump.
(gdb) si 4
0x080482c0 in puts#plt ()
(gdb) disas
Dump of assembler code for function puts#plt:
=> 0x080482c0 <+0>: jmp DWORD PTR ds:0x8049670
0x080482c6 <+6>: push 0x0
0x080482cb <+11>: jmp 0x80482b0
End of assembler dump.
(gdb) si
_IO_puts (str=0x8048500 "Hello") at ioputs.c:35
35 {
(gdb)
Apparently, after binding the PLT entry to the function, we still do a two-step call:
call puts#plt
jmp [ds:puts_address]
Comparing this with how it's implemented in Win32, there all calls of imported functions, e.g. MessageBoxA, are done like
call [ds:MessageBoxA_address]
i.e. in a single step.
Even if taking lazy binding into account, it's still possible to have e.g. [puts_address] contain the call to _dl_runtime_resolve or whatever is needed on startup, so the one-step indirect call would still work.
So what's the reason for such a complication? Is this some sort of branch prediction or branch target prediction optimization?
EDIT in response to Employed Russian's answer (v2)
What I actually mean is that this indirection of call PLT; jump [GOT] is redundant even in the context of lazy binding. Consider the following example (relies on compilation without optimizations by gcc):
#include <stdio.h>
int main()
{
for(int i=0;i<3;++i)
{
puts("Hello");
__asm__ __volatile__("nop");
}
return 0;
}
Running it (with LD_BIND_NOW unset) in GDB:
$ gdb ./test -ex 'b main' -ex r -ex disas/r
Reading symbols from ./test...done.
Breakpoint 1 at 0x8048387
Starting program: /tmp/test
Breakpoint 1, 0x08048387 in main ()
Dump of assembler code for function main:
...
0x08048397 <+19>: c7 04 24 80 84 04 08 mov DWORD PTR [esp],0x8048480
0x0804839e <+26>: e8 11 ff ff ff call 0x80482b4 <puts#plt>
0x080483a3 <+31>: 90 nop
0x080483a4 <+32>: 83 44 24 1c 01 add DWORD PTR [esp+0x1c],0x1
...
Disassembling puts#plt, we can see the address of GOT entry for puts:
(gdb) disas 'puts#plt'
Dump of assembler code for function puts#plt:
0x080482b4 <+0>: jmp DWORD PTR ds:0x8049580
0x080482ba <+6>: push 0x10
0x080482bf <+11>: jmp 0x8048284
End of assembler dump.
So we see it's 0x8049580. We can patch our code for main() to change e8 11 ff ff ff 90 (address 0x8048e9e) to indirect call to GOT entry, i.e. call [ds:0x8049580]: ff 15 80 95 04 08:
(gdb) set *(uint64_t*)0x804839e=0x44830804958015ff
(gdb) disas/r
Dump of assembler code for function main:
...
0x08048397 <+19>: c7 04 24 80 84 04 08 mov DWORD PTR [esp],0x8048480
0x0804839e <+26>: ff 15 80 95 04 08 call DWORD PTR ds:0x8049580
0x080483a4 <+32>: 83 44 24 1c 01 add DWORD PTR [esp+0x1c],0x1
...
Running the program after this still gives:
(gdb) c
Continuing.
Hello
Hello
Hello
[Inferior 1 (process 14678) exited normally]
I.e. the first call did the lazy binding, and the next two just used the result of fixup (you can trace it yourself if you don't believe).
So the question remains: why is this way of calling not used by GCC?
Apparently, after binding the PLT entry to the function, we still do a two-step call:
call puts#plt
jmp [ds:puts_address]
The compiler and linker can't know that you are going to set LD_BIND_NOW=1 at runtime, and so can't go back in time and re-write generated code to use direct call [puts_address].
See also recent -fno-plt patches on the gcc-patches mailing list.
Win32
Win32 doesn't allow lazy function resolution (at least not by default). In other words, they compile / link code that only works as if LD_BIND_NOW=1 is hard-coded at compile / link time. Some history here.
it's still possible to have e.g. [puts_address] contain the call to _dl_runtime_resolve or whatever is needed on startup, so the one-step indirect call would still work.
I think you are confused. The [puts_address] does contain _dl_runtime_resolve at startup (well, not exactly. Gory details). Your question is "why can't the call go directly to [puts_address], why is puts#plt needed?".
The answer is that _dl_runtime_resolve needs to know which function it is resolving. It can't deduce that info from arguments to puts. The entire raison d'être of puts#plt is exactly to supply that info to _dl_runtime_resolve.
Update:
Why can't call <puts#plt> be replaced with call *[puts#GOT].
The answer is provided in the first -fno-plt patch I referenced:
"This comes with caveats. This cannot be generally done for all
functions marked extern as it is impossible for the compiler to say if
a function is "truly extern" (defined in a shared library). If a
function is not truly extern(ends up defined in the final executable),
then calling it indirectly is a performance penalty as it could have
been a direct call."
You could then ask: why can't the linker (which knows whether puts is defined in the same binary or in a separate DSO) rewrite the call *[puts#GOT] back into call <puts#plt>?
The answer is that these are different instructions (different op-codes), and linkers generally do not change instructions, only addresses within instructions (in response to relocation entries).
In theory the linker could do this, but no-one's bothered yet.
Related
While I was reading http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/#id1
question came:
How does PIC shared library after being loaded somewhere in virtual address space of the process knows how to reference external variables?
Here is code of shared library in question:
#include <stdio.h>
extern long var;
void
shara_func(void)
{
printf("%ld\n", var);
}
Produce object code, then shared object(library):
gcc -fPIC -c lib1.c # produce PIC lib1.o
gcc -fPIC -shared lib1.o -o liblib1.so # produce PIC shared library
Disassemble shara_func in shared library:
objdump -d liblib1.so
...
00000000000006d0 <shara_func>:
6d0: 55 push %rbp
6d1: 48 89 e5 mov %rsp,%rbp
6d4: 48 8b 05 fd 08 20 00 mov 0x2008fd(%rip),%rax # 200fd8 <_DYNAMIC+0x1c8>
6db: 48 8b 00 mov (%rax),%rax
6de: 48 89 c6 mov %rax,%rsi
6e1: 48 8d 3d 19 00 00 00 lea 0x19(%rip),%rdi # 701 <_fini+0x9>
6e8: b8 00 00 00 00 mov $0x0,%eax
6ed: e8 be fe ff ff callq 5b0 <printf#plt>
6f2: 90 nop
6f3: 5d pop %rbp
6f4: c3 retq
...
I see that instruction at 0x6d4 address moves some address that is relative to PC to rax, I suppose that is the entry in GOT, GOT referenced relatively from PC to get address of external variable var at runtime(it is resolved at runtime depending where var was loaded).
Then after executing instruction at 0x6db we get external variable's actual content placed in rax, then move value from rax to rsi - second function parameter passed in register.
I was thinking that there is only one GOT in process memory, however,
see that library references GOT? How shared library knows offset to process's GOT when it(PIC library) does not know where in process memory it would be loaded? Or does each shared library has its own GOT that is loaded with her? I would be very glad if you clarify my confusion.
I was thinking that there is only one GOT in process memory, however, see that library references GOT?
We clearly see .got section as part of the library. With readelf we can find what are the sections of the library and how they are loaded:
readelf -e liblib1.so
...
Section Headers:
[21] .got PROGBITS 0000000000200fd0 00000fd0
0000000000000030 0000000000000008 WA 0 0 8
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000000078c 0x000000000000078c R E 200000
LOAD 0x0000000000000df8 0x0000000000200df8 0x0000000000200df8
0x0000000000000230 0x0000000000000238 RW 200000
...
Section to Segment mapping:
Segment Sections...
00 ... .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame
01 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
02 .dynamic
So, there is section .got, but runtime linker ld-linux.so.2 (registered as interpreter for dynamic ELFs) does not load sections; it loads segments as described by Program header with LOAD type. .got is part of segment 01 LOAD with RW flags. Other library will have own GOT (think about compiling liblib2.so from the similar source, it will not know anything about liblib1.so and will have own GOT); so it is "Global" only for the library; but not to the whole program image in memory after loading.
How shared library knows offset to process's GOT when it(PIC library) does not know where in process memory it would be loaded?
It is done by static linker when it takes several ELF objects and combine them all into one library. Linker will generate .got section and put it to some place with known offset from the library code (pc-relative, rip-relative). It writes instructions to program header, so the relative address is known and it is the only needed address to access own GOT.
When objdump is used with -r / -R flags, it will print information about relocations (static / dynamic) recorded in the ELF file or library; it can be combined with -d flag. lib1.o object had relocation here; no known offset to GOT, mov has all zero:
$ objdump -dr lib1.o
lib1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <shara_func>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # b <shara_func+0xb>
7: R_X86_64_REX_GOTPCRELX var-0x4
b: 48 8b 00 mov (%rax),%rax
e: 48 89 c6 mov %rax,%rsi
In library file this was converted to relative address by gcc -shared (it calls ld variant collect2 inside):
$ objdump -d liblib1.so
liblib1.so: file format elf64-x86-64
00000000000006d0 <shara_func>:
6d0: 55 push %rbp
6d1: 48 89 e5 mov %rsp,%rbp
6d4: 48 8b 05 fd 08 20 00 mov 0x2008fd(%rip),%rax # 200fd8 <_DYNAMIC+0x1c8>
And finally, there is dynamic relocation into GOT to put here actual address of var (done by rtld - ld-linux.so.2):
$ objdump -R liblib1.so
liblib1.so: file format elf64-x86-64
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
...
0000000000200fd8 R_X86_64_GLOB_DAT var
Let's use your lib, adding executable with definition, compiling it and running with rtld debugging enabled:
$ cat main.c
long var;
int main(){
shara_func();
return 0;
}
$ gcc main.c -llib1 -L. -o main -Wl,-rpath=`pwd`
$ LD_DEBUG=all ./main 2>&1 |less
...
311: symbol=var; lookup in file=./main [0]
311: binding file /test3/liblib1.so [0] to ./main [0]: normal symbol `var'
So, linker was able to bind relocation for var to the "main" ELF file where it is defined:
$ gdb -q ./main
Reading symbols from ./main...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x4006da
(gdb) r
Starting program: /test3/main
Breakpoint 1, 0x00000000004006da in main ()
(gdb) disassemble shara_func
Dump of assembler code for function shara_func:
0x00007ffff7bd56d0 <+0>: push %rbp
0x00007ffff7bd56d1 <+1>: mov %rsp,%rbp
0x00007ffff7bd56d4 <+4>: mov 0x2008fd(%rip),%rax # 0x7ffff7dd5fd8
0x00007ffff7bd56db <+11>: mov (%rax),%rax
0x00007ffff7bd56de <+14>: mov %rax,%rsi
No changes in mov in your func. rax after func+4 is 0x601040, it is third mapping of ./main according to /proc/$pid/maps:
00601000-00602000 rw-p 00001000 08:07 6691394 /test3/main
And it was loaded from main after this program header (readelf -e ./main)
LOAD 0x0000000000000df0 0x0000000000600df0 0x0000000000600df0
0x0000000000000248 0x0000000000000258 RW 200000
It is part of .bss section:
[26] .bss NOBITS 0000000000601038 00001038
0000000000000010 0000000000000000 WA 0 0 8
After stepping to func+11, we can check value in GOT:
(gdb) b shara_func
(gdb) r
(gdb) si
0x00007ffff7bd56db in shara_func () from /test3/liblib1.so
1: x/i $pc
=> 0x7ffff7bd56db <shara_func+11>: mov (%rax),%rax
(gdb) p $rip+0x2008fd
$6 = (void (*)()) 0x7ffff7dd5fd8
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00601040 0x00000000
Who did write correct value to this GOT entry?
(gdb) watch *0x7ffff7dd5fd8
Hardware watchpoint 2: *0x7ffff7dd5fd8
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /test3/main
Hardware watchpoint 2: *0x7ffff7dd5fd8
Old value = <unreadable>
New value = 6295616
0x00007ffff7de36bf in elf_machine_rela (..) at ../sysdeps/x86_64/dl-machine.h:435
(gdb) bt
#0 0x00007ffff7de36bf in elf_machine_rela (...) at ../sysdeps/x86_64/dl-machine.h:435
#1 elf_dynamic_do_Rela (...) at do-rel.h:137
#2 _dl_relocate_object (...) at dl-reloc.c:258
#3 0x00007ffff7ddaf5b in dl_main (...) at rtld.c:2072
#4 0x00007ffff7df0462 in _dl_sysdep_start (start_argptr=start_argptr#entry=0x7fffffffde20,
dl_main=dl_main#entry=0x7ffff7dd89a0 <dl_main>) at ../elf/dl-sysdep.c:249
#5 0x00007ffff7ddbe7a in _dl_start_final (arg=0x7fffffffde20) at rtld.c:307
#6 _dl_start (arg=0x7fffffffde20) at rtld.c:413
#7 0x00007ffff7dd7cc8 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00601040 0x00000000
Runtime linker of glibc did (rtld.c), just before calling main - here is the source (bit different version) - http://code.metager.de/source/xref/gnu/glibc/sysdeps/x86_64/dl-machine.h
329 case R_X86_64_GLOB_DAT:
330 case R_X86_64_JUMP_SLOT:
331 *reloc_addr = value + reloc->r_addend;
332 break;
With reverse stepping we can get history of code and old value = 0:
(gdb) b _dl_relocate_object
(gdb) r
(gdb) dis 3
(gdb) target record-full
(gdb) c
(gdb) disp/i $pc
(gdb) rsi
(gdb) rsi
(gdb) rsi
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00000000 0x00000000
=> 0x7ffff7de36b8 <_dl_relocate_object+1560>: add 0x10(%rbx),%rax
=> 0x7ffff7de36bc <_dl_relocate_object+1564>: mov %rax,(%r10)
=> 0x7ffff7de36bf <_dl_relocate_object+1567>: nop
I have strange behaviour with x86 (32-bit) linux gcc. I generate signalling NaN using gcc's builtin __builtin_nansf(""), which generates 0x7fa00000. After returning this value from function as float, it is modified into 0x7fe00000. There is short example:
#include <stdio.h>
float f = __builtin_nansf("");
float y;
float func (void)
{
return f;
}
int main (void)
{
printf("%x\n", *((int*)&f));
y = func();
printf("%x\n", *((int*)&y));
}
Program compiled with gcc-4.6.2 program.c, its output:
7fa00000
7fe00000
Gdb:
(gdb) p/x f
$2 = 0x7fa00000
...
(gdb) si
0x08048412 in func ()
1: x/i $pc
0x8048412 <func+14>: flds -0x4(%ebp)
(gdb) x/x $ebp-4
0xbfffeb34: 0x7fa00000
(gdb) si
(gdb) info all-regis
st0 nan(0xe000000000000000) (raw 0x7fffe000000000000000)
... //after return from func
(gdb) si
0x0804843d in main ()
1: x/i $pc
0x804843d <main+38>: fstps 0x804a024
(gdb) si
(gdb) x/x 0x804a024
0x804a024 <y>: 0x7fe00000
Why my signalling NaN is modified? How can I prevent this modification?
I'm not sure you can prevent this. Loading an sNaN on x87 typically raises an INVALID exception, and then converts the value to a qNaN, by setting the msb of the (23 bit) mantissa. That is, OR'ing with 0x00400000.
From the Intel® 64 and IA-32 Architectures Software Developer Manuals, Vol 1, 4.8.3.4 describes sNan/qNan handling. Chapter 8 deals with X87 FPU programming. Volume 3, 22.18 also describes how NaNs are handled by the X87 FPU.
I don't see any bits in the X87 control word that will yield the behaviour you desire for sNaN propagation.
After doing google search for "gcc 7fa00000" I have found the bug 57484 in GCC's bugzilla http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57484 with several useful comments.
Uroš Bizjak (i386 cpu port maintainer in GCC) says in comments 11,12,14 and in last one, that x86 ABI and x86-32 ABI are not designed to fully support IEEE754 standard on x87 and the "issue is unfortunately unfixable":
The ABI is just wrong for the underlying x87 hardware as far as NaNs are concerned.
This issue is unfortunately unfixable. x87 and x86-32 ABI are just not designed to handle all details of IEEE 754 standard.
According to Uroš, when using legacy x87 on x86 gcc target, loads of float and doubles from memory to the x87 FP registers (stack) is considered as format conversion with changing of signaling NANs (sNAN) into quiet NANs (qNAN). The -msse2 -mfpmath=sse option set will help to do all math evaluations in SSE2, but function still returns FP value via x87 stack:
$ gcc-4.6.3 -msse2 -mfpmath=sse test.c -o sse2math.out
$ objdump -d sse2math.out
./c.out: file format elf32-i386
...
08048404 <func>:
8048404: 55 push %ebp
8048405: 89 e5 mov %esp,%ebp
8048407: 83 ec 04 sub $0x4,%esp
804840a: a1 14 a0 04 08 mov 0x804a014,%eax
804840f: 89 45 fc mov %eax,-0x4(%ebp)
8048412: f3 0f 10 45 fc movss -0x4(%ebp),%xmm0
8048417: f3 0f 11 45 fc movss %xmm0,-0x4(%ebp)
804841c: d9 45 fc flds -0x4(%ebp)
804841f: c9 leave
8048420: c3 ret
After adding one more option -mno-fp-ret-in-387 (full set is -msse2 -mfpmath=sse -mno-fp-ret-in-387), x87 fp registers are not more used to pass float return:
08048404 <func>:
8048404: 55 push %ebp
8048405: 89 e5 mov %esp,%ebp
8048407: a1 14 a0 04 08 mov 0x804a014,%eax
804840c: 5d pop %ebp
804840d: c3 ret
But the -mno-fp-ret-in-387 option will change ABI, and may broke many libraries.
In my Linux program, I need a function that takes an address addr and checks whether a callq instruction placed at addr is calling an specific function func loaded from a shared library. I mean, I need to check whether I have something like callq func#PLT at addr.
So, on Linux, how to reach the real address of a function func from a callq func#PLT instruction?
You can only find out about that at runtime, after the dynamic linker resolves the actual load address.
Warning: What follows is slightly deeper magic ...
To illustrate what's happening use a debugger:
#include <stdio.h>
int main(int argc, char **argv) { printf("Hello, World!\n"); return 0; }
Compile it (gcc -O8 ...). objdump -d on the binary shows (the optimization of printf() being substituted with puts() for a plain string not withstanding ...):
Disassembly of section .init:
[ ... ]
Disassembly of section .plt:
0000000000400408 <__libc_start_main#plt-0x10>:
400408: ff 35 a2 04 10 00 pushq 1049762(%rip) # 5008b0 <_GLOBAL_OFFSET_TABLE_+0x8>>
40040e: ff 25 a4 04 10 00 jmpq *1049764(%rip) # 5008b8 <_GLOBAL_OFFSET_TABLE_+0x10>
[ ... ]
0000000000400428 <puts#plt>:
400428: ff 25 9a 04 10 00 jmpq *1049754(%rip) # 5008c8 <_GLOBAL_OFFSET_TABLE_+0x20>
40042e: 68 01 00 00 00 pushq $0x1
400433: e9 d0 ff ff ff jmpq 400408 <_init+0x18>
[ ... ]
0000000000400500 <main>:
400500: 48 83 ec 08 sub $0x8,%rsp
400504: bf 0c 06 40 00 mov $0x40060c,%edi
400509: e8 1a ff ff ff callq 400428 <puts#plt>
40050e: 31 c0 xor %eax,%eax
400510: 48 83 c4 08 add $0x8,%rsp
400514: c3 retq
Now load it into gdb. Then:
$ gdb ./tcc
GNU gdb Red Hat Linux (6.3.0.0-0.30.1rh)
[ ... ]
(gdb) x/3i 0x400428
0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>
0x40042e: pushq $0x1
0x400433: jmpq 0x400408
(gdb) x/gx 0x5008c8
0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>: 0x000000000040042e
Notice this value points back to the instruction directly following the first jmpq; this means the puts#plt slot, on first invocation, will simply "fall through" to:
(gdb) x/3i 0x400408
0x400408: pushq 1049762(%rip) # 0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>
0x40040e: jmpq *1049764(%rip) # 0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>
0x400414: nop
(gdb) x/gx 0x5008b0
0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>: 0x0000000000000000
(gdb) x/gx 0x5008b8
0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>: 0x0000000000000000
The function address and argument aren't initialized yet.
This is the state just after program load, but before executing. Now start executing it:
(gdb) break main
Breakpoint 1 at 0x400500
(gdb) run
Starting program: tcc
(no debugging symbols found)
(no debugging symbols found)
Breakpoint 1, 0x0000000000400500 in main ()
(gdb) x/i 0x400428
0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>
(gdb) x/gx 0x5008c8
0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>: 0x000000000040042e
So this hasn't changed yet - but the targets (the GOT contents for the libc initialization) are different now:
(gdb) x/gx 0x5008b0
0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>: 0x0000002a9566b9a8
(gdb) x/gx 0x5008b8
0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>: 0x0000002a955609f0
(gdb) disas 0x0000002a955609f0
Dump of assembler code for function _dl_runtime_resolve:
0x0000002a955609f0 <_dl_runtime_resolve+0>: sub $0x38,%rsp
[ ... ]
I.e. at program load time, the dynamic linker will resolve the "init" parts first. It substitutes the GOT references with pointers that redirect into the dynamic linking code.
Therefore, when first calling an external-to-the-binary function through the .plt reference, it'll jump into the linker again. Let it do that, then inspect the program after that - the state has changed again:
(gdb) break *0x0000000000400514
Breakpoint 2 at 0x400514
(gdb) continue
Continuing.
Hello, World!
Breakpoint 2, 0x0000000000400514 in main ()
(gdb) x/i 0x400428
0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>
(gdb) x/gx 0x5008c8
0x5008c8 : 0x0000002a956c8870
(gdb) disas 0x0000002a956c8870
Dump of assembler code for function puts:
0x0000002a956c8870 <puts+0>: mov %rbx,0xffffffffffffffe0(%rsp)
[ ... ]
So there's your redirect right into libc now - the PLT reference to puts() finally got resolved.
The instructions to the linker where to insert the actual function load addresses (that we've seen it do for _dl_runtime_resolve comes from special sections in the ELF binary:
$ readelf -a tcc
[ ... ]
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
[ ... ]
INTERP 0x0000000000000200 0x0000000000400200 0x0000000000400200
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
[ ... ]
Dynamic section at offset 0x700 contains 21 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
[ ... ]
Relocation section '.rela.plt' at offset 0x3c0 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
0000005008c0 000100000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
0000005008c8 000200000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
There's more to ELF than just the above, but these three pieces tell the kernel's binary format handler "this ELF binary has an interpreter" (which is the dynamic linker) that needs to be loaded / initialized first, that it requires libc.so.6, and that offsets 0x5008c0 and 0x5008c8 in the program's writeable data section must be substituted by the load addresses for __libc_start_main and puts, respectively, when the step of dynamic linking is actually performed.
How exactly that happens, from ELF's point of view, is up to the details of the interpreter (aka, the dynamic linker implementation).
"call 0x80482f0 <puts#plt>"? Just need help with one line of code in a 'hello world' program in x86 assembly.
NOTE: i'm running ubuntu linux while programming/debugging this, using gcc as the compiler and gdb for the debugger.
I am reading Hacking: The art of Exploitation V2 and I compiled this C program:
1 #include <stdio.h>
2
3 int main()
4 {
5 int i;
6 for(i=0; i<10; i++)
7 {
8 printf("Hello, world\n");
9 }
10 return 0;
into this program in assembly:
0x080483b4 <+0>: push ebp
0x080483b5 <+1>: mov ebp,esp
0x080483b7 <+3>: and esp,0xfffffff0
0x080483ba <+6>: sub esp,0x20
0x080483bd <+9>: mov DWORD PTR [esp+0x1c],0x0
0x080483c5 <+17>: jmp 0x80483d8 <main+36>
0x080483c7 <+19>: mov DWORD PTR [esp],0x80484b0
0x080483ce <+26>: call 0x80482f0 <puts#plt>
=> 0x080483d3 <+31>: add DWORD PTR [esp+0x1c],0x1
0x080483d8 <+36>: cmp DWORD PTR [esp+0x1c],0x9
0x080483dd <+41>: jle 0x80483c7 <main+19>
0x080483df <+43>: mov eax,0x0
0x080483e4 <+48>: leave
0x080483e5 <+49>: ret
now.. i understand every portion of this program, until it gets to:
0x080483ce <+26>: call 0x80482f0 <puts#plt>
what i do not understand is.. if "Hello, world\n" is stored at 0x80484b0, and that address is then stored into the address at ESP, why does the:
0x080483ce <+26>: call 0x80482f0 <puts#plt>
refer to 0x80482f0, instead of [esp] or just "0x80484b0" to print "Hello, world\n" to the screen? i used gdb and i cannot figure out what exactly is being referenced with 0x80482f0.. any help would be great
thanks (and remember, im just starting out with this stuff, so im a noob)
also.. i copy and pasted the disassembled main function from gdb for convenience, if you need any more info, just ask. and if you would like to explain that one command for me, that would be great as well because i've only used "int 80h"'s for printing stuff to the screen before
0x80482f0 is the address of the puts function. To be more precise, it points to the entry for puts() in the program linker table (PLT) - basically just a bunch of JMP <some routine in a so-library>s (it's a little more complex than that, but that's not important for this discussion). The puts function looks for its argument on the stack - ie, at [esp].
You may be wondering where that puts() call came from - the compiler here was smart enough to see that you didn't actually use any format string parameters in your call to printf(), and replaced that call with a call to the (somewhat faster) puts(). If you'll look closely, you'll see that it also removed the newline from your string, because puts() appends a newline after printing the string it is given.
There are compiler options in MSVC to enable the automatic generation of instrumentation calls on entering and exiting functions. These hooks are called _penter() and _pexit(). The options to the compiler are:
/Gh Enable _penter Hook Function
/GH Enable _pexit Hook Function
Is there a pragma or some sort of function declaration that will turn off the instrumentation on a per function basis? I know that using __declspec(naked) functions will not be instrumented but this isn't always a very practical option. I'm using MSVC both on PC and on a non-X86 platform and the non-X86 platform is a pain to manually write epilog/prolog in assembler (not to mention it messes up the debugger stack tracing).
If this in only on a per file (compiler option) basis, I think I will have to split out the special functions into a separate file to turn the option off but it'd be much easier if I could just control it on a per file basis.
The fallback plan if this can't be done is to just move the functions to their own CPP translation unit and compile separately without the options.
I don't see any way to do this. Given that you would have to locate and handle every affected function anyway, perhaps moving them into their own module(s) is not such a big deal.
Asker is aware, but worth writing out the disqualified approach for future reference. /Gh and /GH do not instrument naked functions. You can declare the function you want to opt-out for as naked and manually supply the standard prolog/epilog, as shown below,
void instrumented_fn(void *p)
{
/* Function body */
}
__declspec(naked) void uninstrumented_fn(void *p)
{
__asm
{
/* prolog */
push ebp
mov ebp, esp
sub esp, __LOCAL_SIZE
}
/* Function body */
__asm
{
/* epilog */
mov esp, ebp
pop ebp
ret
}
}
An example instrumented function disassembly, showing calls to penter and pexit,
537b0: e8 7c d9 ff ff call 0x51131
537b5: 55 push %ebp
537b6: 8b ec mov %esp,%ebp
537b8: 83 ec 40 sub $0x40,%esp
537bb: 53 push %ebx
537bc: 56 push %esi
537bd: 57 push %edi
537be: 90 nop
537bf: 90 nop
537c0: 90 nop
537c1: 5f pop %edi
537c2: 5e pop %esi
537c3: 5b pop %ebx
537c4: 8b e5 mov %ebp,%esp
537c6: 5d pop %ebp
537c7: e8 01 d9 ff ff call 0x510cd
537cc: c3 ret
The equivalent uninstrumented function disassembly (naked body plus standard prolog/epilog)
51730: 55 push %ebp
51731: 8b ec mov %esp,%ebp
51733: 83 ec 40 sub $0x40,%esp
51736: 90 nop
51737: 90 nop
51738: 90 nop
51739: 8b e5 mov %ebp,%esp
5173b: 5d pop %ebp
5173c: c3 ret