How shared library finds GOT section? - linux

While I was reading http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/#id1
question came:
How does PIC shared library after being loaded somewhere in virtual address space of the process knows how to reference external variables?
Here is code of shared library in question:
#include <stdio.h>
extern long var;
void
shara_func(void)
{
printf("%ld\n", var);
}
Produce object code, then shared object(library):
gcc -fPIC -c lib1.c # produce PIC lib1.o
gcc -fPIC -shared lib1.o -o liblib1.so # produce PIC shared library
Disassemble shara_func in shared library:
objdump -d liblib1.so
...
00000000000006d0 <shara_func>:
6d0: 55 push %rbp
6d1: 48 89 e5 mov %rsp,%rbp
6d4: 48 8b 05 fd 08 20 00 mov 0x2008fd(%rip),%rax # 200fd8 <_DYNAMIC+0x1c8>
6db: 48 8b 00 mov (%rax),%rax
6de: 48 89 c6 mov %rax,%rsi
6e1: 48 8d 3d 19 00 00 00 lea 0x19(%rip),%rdi # 701 <_fini+0x9>
6e8: b8 00 00 00 00 mov $0x0,%eax
6ed: e8 be fe ff ff callq 5b0 <printf#plt>
6f2: 90 nop
6f3: 5d pop %rbp
6f4: c3 retq
...
I see that instruction at 0x6d4 address moves some address that is relative to PC to rax, I suppose that is the entry in GOT, GOT referenced relatively from PC to get address of external variable var at runtime(it is resolved at runtime depending where var was loaded).
Then after executing instruction at 0x6db we get external variable's actual content placed in rax, then move value from rax to rsi - second function parameter passed in register.
I was thinking that there is only one GOT in process memory, however,
see that library references GOT? How shared library knows offset to process's GOT when it(PIC library) does not know where in process memory it would be loaded? Or does each shared library has its own GOT that is loaded with her? I would be very glad if you clarify my confusion.

I was thinking that there is only one GOT in process memory, however, see that library references GOT?
We clearly see .got section as part of the library. With readelf we can find what are the sections of the library and how they are loaded:
readelf -e liblib1.so
...
Section Headers:
[21] .got PROGBITS 0000000000200fd0 00000fd0
0000000000000030 0000000000000008 WA 0 0 8
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000000078c 0x000000000000078c R E 200000
LOAD 0x0000000000000df8 0x0000000000200df8 0x0000000000200df8
0x0000000000000230 0x0000000000000238 RW 200000
...
Section to Segment mapping:
Segment Sections...
00 ... .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame
01 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
02 .dynamic
So, there is section .got, but runtime linker ld-linux.so.2 (registered as interpreter for dynamic ELFs) does not load sections; it loads segments as described by Program header with LOAD type. .got is part of segment 01 LOAD with RW flags. Other library will have own GOT (think about compiling liblib2.so from the similar source, it will not know anything about liblib1.so and will have own GOT); so it is "Global" only for the library; but not to the whole program image in memory after loading.
How shared library knows offset to process's GOT when it(PIC library) does not know where in process memory it would be loaded?
It is done by static linker when it takes several ELF objects and combine them all into one library. Linker will generate .got section and put it to some place with known offset from the library code (pc-relative, rip-relative). It writes instructions to program header, so the relative address is known and it is the only needed address to access own GOT.
When objdump is used with -r / -R flags, it will print information about relocations (static / dynamic) recorded in the ELF file or library; it can be combined with -d flag. lib1.o object had relocation here; no known offset to GOT, mov has all zero:
$ objdump -dr lib1.o
lib1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <shara_func>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # b <shara_func+0xb>
7: R_X86_64_REX_GOTPCRELX var-0x4
b: 48 8b 00 mov (%rax),%rax
e: 48 89 c6 mov %rax,%rsi
In library file this was converted to relative address by gcc -shared (it calls ld variant collect2 inside):
$ objdump -d liblib1.so
liblib1.so: file format elf64-x86-64
00000000000006d0 <shara_func>:
6d0: 55 push %rbp
6d1: 48 89 e5 mov %rsp,%rbp
6d4: 48 8b 05 fd 08 20 00 mov 0x2008fd(%rip),%rax # 200fd8 <_DYNAMIC+0x1c8>
And finally, there is dynamic relocation into GOT to put here actual address of var (done by rtld - ld-linux.so.2):
$ objdump -R liblib1.so
liblib1.so: file format elf64-x86-64
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
...
0000000000200fd8 R_X86_64_GLOB_DAT var
Let's use your lib, adding executable with definition, compiling it and running with rtld debugging enabled:
$ cat main.c
long var;
int main(){
shara_func();
return 0;
}
$ gcc main.c -llib1 -L. -o main -Wl,-rpath=`pwd`
$ LD_DEBUG=all ./main 2>&1 |less
...
311: symbol=var; lookup in file=./main [0]
311: binding file /test3/liblib1.so [0] to ./main [0]: normal symbol `var'
So, linker was able to bind relocation for var to the "main" ELF file where it is defined:
$ gdb -q ./main
Reading symbols from ./main...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x4006da
(gdb) r
Starting program: /test3/main
Breakpoint 1, 0x00000000004006da in main ()
(gdb) disassemble shara_func
Dump of assembler code for function shara_func:
0x00007ffff7bd56d0 <+0>: push %rbp
0x00007ffff7bd56d1 <+1>: mov %rsp,%rbp
0x00007ffff7bd56d4 <+4>: mov 0x2008fd(%rip),%rax # 0x7ffff7dd5fd8
0x00007ffff7bd56db <+11>: mov (%rax),%rax
0x00007ffff7bd56de <+14>: mov %rax,%rsi
No changes in mov in your func. rax after func+4 is 0x601040, it is third mapping of ./main according to /proc/$pid/maps:
00601000-00602000 rw-p 00001000 08:07 6691394 /test3/main
And it was loaded from main after this program header (readelf -e ./main)
LOAD 0x0000000000000df0 0x0000000000600df0 0x0000000000600df0
0x0000000000000248 0x0000000000000258 RW 200000
It is part of .bss section:
[26] .bss NOBITS 0000000000601038 00001038
0000000000000010 0000000000000000 WA 0 0 8
After stepping to func+11, we can check value in GOT:
(gdb) b shara_func
(gdb) r
(gdb) si
0x00007ffff7bd56db in shara_func () from /test3/liblib1.so
1: x/i $pc
=> 0x7ffff7bd56db <shara_func+11>: mov (%rax),%rax
(gdb) p $rip+0x2008fd
$6 = (void (*)()) 0x7ffff7dd5fd8
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00601040 0x00000000
Who did write correct value to this GOT entry?
(gdb) watch *0x7ffff7dd5fd8
Hardware watchpoint 2: *0x7ffff7dd5fd8
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /test3/main
Hardware watchpoint 2: *0x7ffff7dd5fd8
Old value = <unreadable>
New value = 6295616
0x00007ffff7de36bf in elf_machine_rela (..) at ../sysdeps/x86_64/dl-machine.h:435
(gdb) bt
#0 0x00007ffff7de36bf in elf_machine_rela (...) at ../sysdeps/x86_64/dl-machine.h:435
#1 elf_dynamic_do_Rela (...) at do-rel.h:137
#2 _dl_relocate_object (...) at dl-reloc.c:258
#3 0x00007ffff7ddaf5b in dl_main (...) at rtld.c:2072
#4 0x00007ffff7df0462 in _dl_sysdep_start (start_argptr=start_argptr#entry=0x7fffffffde20,
dl_main=dl_main#entry=0x7ffff7dd89a0 <dl_main>) at ../elf/dl-sysdep.c:249
#5 0x00007ffff7ddbe7a in _dl_start_final (arg=0x7fffffffde20) at rtld.c:307
#6 _dl_start (arg=0x7fffffffde20) at rtld.c:413
#7 0x00007ffff7dd7cc8 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00601040 0x00000000
Runtime linker of glibc did (rtld.c), just before calling main - here is the source (bit different version) - http://code.metager.de/source/xref/gnu/glibc/sysdeps/x86_64/dl-machine.h
329 case R_X86_64_GLOB_DAT:
330 case R_X86_64_JUMP_SLOT:
331 *reloc_addr = value + reloc->r_addend;
332 break;
With reverse stepping we can get history of code and old value = 0:
(gdb) b _dl_relocate_object
(gdb) r
(gdb) dis 3
(gdb) target record-full
(gdb) c
(gdb) disp/i $pc
(gdb) rsi
(gdb) rsi
(gdb) rsi
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00000000 0x00000000
=> 0x7ffff7de36b8 <_dl_relocate_object+1560>: add 0x10(%rbx),%rax
=> 0x7ffff7de36bc <_dl_relocate_object+1564>: mov %rax,(%r10)
=> 0x7ffff7de36bf <_dl_relocate_object+1567>: nop

Related

Why GOT entry offset appears wrong?

I wrote simple shared library:
extern void some_func(void);
void
function(void)
{
some_func();
}
Compiled/built:
gcc -fPIC -mcmodel=large -c test.c -o test.o
gcc -fPIC -shared test.o -o libtest.so
Disassembled, to see how some_func is referenced:
$ objdump -d libtest.so
00000000000006a0 <function>:
6a0: 55 push %rbp
6a1: 48 89 e5 mov %rsp,%rbp
6a4: 41 57 push %r15
6a6: 48 83 ec 08 sub $0x8,%rsp
6aa: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 6aa <function+0xa>
6b1: 49 bb 56 09 20 00 00 movabs $0x200956,%r11
6b8: 00 00 00
6bb: 4c 01 d8 add %r11,%rax
6be: 49 89 c7 mov %rax,%r15
6c1: 48 ba 80 f5 df ff ff movabs $0xffffffffffdff580,%rdx
6c8: ff ff ff
6cb: 48 01 c2 add %rax,%rdx
6ce: ff d2 callq *%rdx
6d0: 90 nop
6d1: 48 83 c4 08 add $0x8,%rsp
6d5: 41 5f pop %r15
6d7: 5d pop %rbp
6d8: c3 retq
Looked where .got.plt is located:
$ readelf -S libtest.so
...
[21] .got.plt PROGBITS 0000000000201000 00001000
0000000000000020 0000000000000008 WA 0 0 8
...
What is the relocation:
$ readelf -r libtest.so
Relocation section '.rela.plt' at offset 0x538 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000201018 000400000007 R_X86_64_JUMP_SLO 0000000000000000 some_func + 0
In 6aa-6bb we get absolute location of GOT: 6aa + 0x200956 = 0x201000
That agrees with readelf -S libtest.so 's output.
We skip 3 reserved bytes in GOT(functions-related) and determine that some_func's absolute address should be found at +0x18(forth byte from GOT) offset at runtime.
That agrees with readelf -r libtest.so.
But 6c1 instruction in objdump's disassembly shows:
movabs $0xfff...dff580, %rdx
I expect that source operand will hold +0x18 (offset from GOT, its address located at rax), but instead it has some large negative number.
Could you explain what it shows that number but not 0x18?
There are two kinds of relocations: static and dynamic (1); one for static linker ld and other for loader (dynamic linker, rtld) - ld-linux.so.2 for linux's glibc 2.* (check Dynamic Linking and Loading, 1999 or Static Linkers and Dyanmic Link Loaders).
When you use objdump to dump relocations, it has -r option for static relocations, and -R for dynamic relocations.
Your case is not just GOT, it is GOT.PLT - GOT used for procedute linkage. This kind of access uses dynamic relocations. So, you should check output of objdump -dR libtest.so, it will show you both disassembly and dynamic relocations in it.
Cited line from readelf -r libtest.so is just for PLT table, not for the code.
http://www.airs.com/blog/archives/41
or function calls, the program linker will set up a PLT entry to look like this:
jmp *offset(%ebx)
pushl #index
jmp first_plt_entry
The program linker will allocate an entry in the GOT for each entry in
the PLT. It will create a dynamic relocation for the GOT entry of type
JMP_SLOT. It will initialize the GOT entry to the base address of the
shared library plus the address of the second instruction in the code
sequence above. When the dynamic linker does the initial lazy binding
on a JMP_SLOT reloc, it will simply add the difference between the
shared library load address and the shared library base address to the
GOT entry. The effect is that the first jmp instruction will jump to
the second instruction, which will push the index entry and branch to
the first PLT entry. The first PLT entry is special, and looks like this:
pushl 4(%ebx)
jmp *8(%ebx)
This references the second and third entries in the GOT. The dynamic
linker will initialize them to have appropriate values for a callback
into the dynamic linker itself. The dynamic linker will use the index
pushed by the first code sequence to find the JMP_SLOT relocation.
When the dynamic linker determines the function to be called, it will
store the address of the function into the GOT entry references by the
first code sequence. Thus, the next time the function is called, the
jmp instruction will branch directly to the right code.

Why is .data executable?

According to readelf:
----------------------------------------------------------------------
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[24] .data PROGBITS 0000000000601040 00001040
0000000000000051 0000000000000000 WA 0 0 32
----------------------------------------------------------------------
Section to Segment mapping:
Segment Sections...
00
01
02
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
----------------------------------------------------------------------
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR
INTERP
LOAD
LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x0000000000000281 0x0000000000000288 RW 200000
As you can see above .data segment has W (Write) and A (Alloc) permissions and .data is loaded in a LOAD section with R (Read) W (Write).
However, the shellcode in the .data section is executable, according to GDB:
0x601060 <bytecode>: xor rax,rax
=> 0x601063 <bytecode+3>: xor rdi,rdi
And I don't know why. Is this correct? What am I missing?
However, the shellcode in the .data section is executable, according to GDB:
The GDB output does not tell you that the .data section is executable. GDB will happily disassemble any memory you ask it to disassemble.
Try this:
(gdb) set $p = (void (*)(void))&bytecode
(gdb) call $p()
This should result in a SIGSEGV on the first instruction of bytecode, because it in fact is not executable.

Can _start be the thumb function?

Help me please with gnu assembler for arm926ejs cpu.
I try to build a simple program(test.S):
.global _start
_start:
mov r0, #2
bx lr
and success build it:
arm-none-linux-gnueabi-as -mthumb -o test.o test.S
arm-none-linux-gnueabi-ld -o test test.o
but when I run the program in the arm target linux environment, I get an error:
./test
Segmentation fault
What am I doing wrong?
Can _start function be the thumb func?
or
It is always arm func?
Can _start be a thumb function (in a Linux user program)?
Yes it can. The steps are not as simple as you may believe.
Please use the .code 16 as described by others. Also look at ARM Script predicate; my answer shows how to detect a thumb binary. The entry symbol must have the traditional _start+1 value or Linux will decide to call your _start in ARM mode.
Also your code is trying to emulate,
int main(void) { return 2; }
The _start symbol must not do this (as per auselen). To do _start to main() in ARM mode you need,
#include <linux/unistd.h>
static inline void exit(int status)
{
asm volatile ("mov r0, %0\n\t"
"mov r7, %1\n\t"
"swi #7\n\t"
: : "r" (status),
"Ir" (__NR_exit)
: "r0", "r7");
}
/* Wrapper for main return code. */
void __attribute__ ((unused)) estart (int argc, char*argv[])
{
int rval = main(argc,argv);
exit(rval);
}
/* Setup arguments for estart [like main()]. */
void __attribute__ ((naked)) _start (void)
{
asm(" sub lr, lr, lr\n" /* Clear the link register. */
" ldr r0, [sp]\n" /* Get argc... */
" add r1, sp, #4\n" /* ... and argv ... */
" b estart\n" /* Let's go! */
);
}
It is good to clear the lr so that stack traces will terminate. You can avoid the argc and argv processing if you want. The start shows how to work with this. The estart is just a wrapper to convert the main() return code to an exit() call.
You need to convert the above assembler to Thumb equivalents. I would suggest using gcc inline assembler. You can convert to pure assembler source if you get inlines to work. However, doing this in 'C' source is probably more practical, unless you are trying to make a very minimal executable.
Helpful gcc arguements are,
-nostartfiles -static -nostdlib -isystem <path to linux user headers>
Add -mthumb and you should have a harness for either mode.
Your problem is you end with
bx lr
and you expect Linux to take over after that. That exact line must be the cause of Segmentation fault.
You can try to create a minimal executable then try to bisect it to see the guts and understand how an executable is expected to behave.
See below for a working example:
.global _start
.thumb_func
_start:
mov r0, #42
mov r7, #1
svc #0
compile with
arm-linux-gnueabihf-as start.s -o start.o && arm-linux-gnueabihf-ld
start.o -o start_test
and dump to see the guts
$ arm-linux-gnueabihf-readelf -a -W start_test
Now you should notice the odd address of _start
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x8055
Start of program headers: 52 (bytes into file)
Start of section headers: 160 (bytes into file)
Flags: 0x5000000, Version5 EABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 3
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00008054 000054 000006 00 AX 0 0 4
[ 2] .ARM.attributes ARM_ATTRIBUTES 00000000 00005a 000014 00 0 0 1
[ 3] .shstrtab STRTAB 00000000 00006e 000031 00 0 0 1
[ 4] .symtab SYMTAB 00000000 000190 0000e0 10 5 6 4
[ 5] .strtab STRTAB 00000000 000270 000058 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00008000 0x00008000 0x0005a 0x0005a R E 0x8000
Section to Segment mapping:
Segment Sections...
00 .text
There is no dynamic section in this file.
There are no relocations in this file.
There are no unwind sections in this file.
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00008054 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 FILE LOCAL DEFAULT ABS start.o
4: 00008054 0 NOTYPE LOCAL DEFAULT 1 $t
5: 00000000 0 FILE LOCAL DEFAULT ABS
6: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 _bss_end__
7: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 __bss_start__
8: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 __bss_end__
9: 00008055 0 FUNC GLOBAL DEFAULT 1 _start
10: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 __bss_start
11: 0001005c 0 NOTYPE GLOBAL DEFAULT 1 __end__
12: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 _edata
13: 0001005c 0 NOTYPE GLOBAL DEFAULT 1 _end
No version information found in this file.
Attribute Section: aeabi
File Attributes
Tag_CPU_arch: v4T
Tag_THUMB_ISA_use: Thumb-1
here answer.
Thanks for all.
http://stuff.mit.edu/afs/sipb/project/egcs/src/egcs/gcc/config/arm/README-interworking
Calls via function pointers should use the BX instruction if the call is made in ARM mode:
.code 32
mov lr, pc
bx rX
This code sequence will not work in Thumb mode however, since the mov instruction will not set the bottom bit of the lr register. Instead a branch-and-link to the _call_via_rX functions should be used instead:
.code 16
bl _call_via_rX
where rX is replaced by the name of the register containing the function address.

how to get the actual address of `func` from `callq func#PLT`

In my Linux program, I need a function that takes an address addr and checks whether a callq instruction placed at addr is calling an specific function func loaded from a shared library. I mean, I need to check whether I have something like callq func#PLT at addr.
So, on Linux, how to reach the real address of a function func from a callq func#PLT instruction?
You can only find out about that at runtime, after the dynamic linker resolves the actual load address.
Warning: What follows is slightly deeper magic ...
To illustrate what's happening use a debugger:
#include <stdio.h>
int main(int argc, char **argv) { printf("Hello, World!\n"); return 0; }
Compile it (gcc -O8 ...). objdump -d on the binary shows (the optimization of printf() being substituted with puts() for a plain string not withstanding ...):
Disassembly of section .init:
[ ... ]
Disassembly of section .plt:
0000000000400408 <__libc_start_main#plt-0x10>:
400408: ff 35 a2 04 10 00 pushq 1049762(%rip) # 5008b0 <_GLOBAL_OFFSET_TABLE_+0x8>>
40040e: ff 25 a4 04 10 00 jmpq *1049764(%rip) # 5008b8 <_GLOBAL_OFFSET_TABLE_+0x10>
[ ... ]
0000000000400428 <puts#plt>:
400428: ff 25 9a 04 10 00 jmpq *1049754(%rip) # 5008c8 <_GLOBAL_OFFSET_TABLE_+0x20>
40042e: 68 01 00 00 00 pushq $0x1
400433: e9 d0 ff ff ff jmpq 400408 <_init+0x18>
[ ... ]
0000000000400500 <main>:
400500: 48 83 ec 08 sub $0x8,%rsp
400504: bf 0c 06 40 00 mov $0x40060c,%edi
400509: e8 1a ff ff ff callq 400428 <puts#plt>
40050e: 31 c0 xor %eax,%eax
400510: 48 83 c4 08 add $0x8,%rsp
400514: c3 retq
Now load it into gdb. Then:
$ gdb ./tcc
GNU gdb Red Hat Linux (6.3.0.0-0.30.1rh)
[ ... ]
(gdb) x/3i 0x400428
0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>
0x40042e: pushq $0x1
0x400433: jmpq 0x400408
(gdb) x/gx 0x5008c8
0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>: 0x000000000040042e
Notice this value points back to the instruction directly following the first jmpq; this means the puts#plt slot, on first invocation, will simply "fall through" to:
(gdb) x/3i 0x400408
0x400408: pushq 1049762(%rip) # 0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>
0x40040e: jmpq *1049764(%rip) # 0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>
0x400414: nop
(gdb) x/gx 0x5008b0
0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>: 0x0000000000000000
(gdb) x/gx 0x5008b8
0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>: 0x0000000000000000
The function address and argument aren't initialized yet.
This is the state just after program load, but before executing. Now start executing it:
(gdb) break main
Breakpoint 1 at 0x400500
(gdb) run
Starting program: tcc
(no debugging symbols found)
(no debugging symbols found)
Breakpoint 1, 0x0000000000400500 in main ()
(gdb) x/i 0x400428
0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>
(gdb) x/gx 0x5008c8
0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>: 0x000000000040042e
So this hasn't changed yet - but the targets (the GOT contents for the libc initialization) are different now:
(gdb) x/gx 0x5008b0
0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>: 0x0000002a9566b9a8
(gdb) x/gx 0x5008b8
0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>: 0x0000002a955609f0
(gdb) disas 0x0000002a955609f0
Dump of assembler code for function _dl_runtime_resolve:
0x0000002a955609f0 <_dl_runtime_resolve+0>: sub $0x38,%rsp
[ ... ]
I.e. at program load time, the dynamic linker will resolve the "init" parts first. It substitutes the GOT references with pointers that redirect into the dynamic linking code.
Therefore, when first calling an external-to-the-binary function through the .plt reference, it'll jump into the linker again. Let it do that, then inspect the program after that - the state has changed again:
(gdb) break *0x0000000000400514
Breakpoint 2 at 0x400514
(gdb) continue
Continuing.
Hello, World!
Breakpoint 2, 0x0000000000400514 in main ()
(gdb) x/i 0x400428
0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>
(gdb) x/gx 0x5008c8
0x5008c8 : 0x0000002a956c8870
(gdb) disas 0x0000002a956c8870
Dump of assembler code for function puts:
0x0000002a956c8870 <puts+0>: mov %rbx,0xffffffffffffffe0(%rsp)
[ ... ]
So there's your redirect right into libc now - the PLT reference to puts() finally got resolved.
The instructions to the linker where to insert the actual function load addresses (that we've seen it do for _dl_runtime_resolve comes from special sections in the ELF binary:
$ readelf -a tcc
[ ... ]
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
[ ... ]
INTERP 0x0000000000000200 0x0000000000400200 0x0000000000400200
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
[ ... ]
Dynamic section at offset 0x700 contains 21 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
[ ... ]
Relocation section '.rela.plt' at offset 0x3c0 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
0000005008c0 000100000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
0000005008c8 000200000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
There's more to ELF than just the above, but these three pieces tell the kernel's binary format handler "this ELF binary has an interpreter" (which is the dynamic linker) that needs to be loaded / initialized first, that it requires libc.so.6, and that offsets 0x5008c0 and 0x5008c8 in the program's writeable data section must be substituted by the load addresses for __libc_start_main and puts, respectively, when the step of dynamic linking is actually performed.
How exactly that happens, from ELF's point of view, is up to the details of the interpreter (aka, the dynamic linker implementation).

How are variables in shared libraries referenced by loader?

I now understand how dynamic functions are referenced, by procedure linkage table like below:
Dump of assembler code for function foo#plt:
0x0000000000400528 <foo#plt+0>: jmpq *0x2004d2(%rip) # 0x600a00 <_GLOBAL_OFFSET_TABLE_+40>
0x000000000040052e <foo#plt+6>: pushq $0x2
0x0000000000400533 <foo#plt+11>: jmpq 0x4004f8
(gdb) disas 0x4004f8
No function contains specified address.
But I don't know how dynamic variables are referenced,though I found the values are populated in the GOT once started,but there's no stub like above,how does it work?
The dynamic loader relocates all references to variables before transferring control to the user program.
There is no "stub" for them, because once the user program starts executing, it is not possible for the loader to regain control and update variable addresses. If this isn't clear to you, then you have not really understood how the PLT lazy-resolution stub works.
Global variables are accessed indirectly, via a global offset table.
When compiling a program, the compiler generates code that performs
indirect accesses, and emits relocation information specifying the
entry in the global offset table being used.
The linker performs these relocations when creating the final
dynamically loadable object, resulting in machine code that does not
need further patching at load time.
To see this in action, consider the following code fragment.
int v1;
int f(void) { return !v1; }
The function f references a global v1. The machine code generated
for the function looks like the following (on an i386):
% gcc -c -fpic a.c
% objdump --disassemble --reloc a.o
[snip]
Disassembly of section .text:
00000000 <f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: e8 fc ff ff ff call 4 <f+0x4>
4: R_386_PC32 __i686.get_pc_thunk.cx
8: 81 c1 02 00 00 00 add $0x2,%ecx
a: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
e: 8b 81 00 00 00 00 mov 0x0(%ecx),%eax
10: R_386_GOT32 v1
14: 8b 00 mov (%eax),%eax
16: 85 c0 test %eax,%eax
18: 0f 94 c0 sete %al
1b: 0f b6 c0 movzbl %al,%eax
1e: 5d pop %ebp
1f: c3 ret
Disassembly of section .text.__i686.get_pc_thunk.cx:
00000000 <__i686.get_pc_thunk.cx>:
0: 8b 0c 24 mov (%esp),%ecx
3: c3 ret
Machine code walk-through:
(Offsets 0x0 and 0x1) The standard function prologue.
(Offset 0x3) The call to __i686.get_pc_thunk.cx prepares for
PC-relative addressing by loading the address of the instruction
after the call into register %ecx.
(Offset 0x8) The value in %ecx is adjusted to point to the start
of the global offset table. This adjustment is signalled by the
relocation entry of type R_386_GOTPC.
(Offset 0xE) The address of global v1 is retrieved. The
R_386_GOT32 relocation supplies the offset of v1's entry from
the base of the global offset table.
(Offset 0x14) The value in v1 is retrieved into register %eax.
(Offsets 0x16--0x1F) The rest of the computation for function f.
In the final shared object, the linker patches the function's code to
the following:
% gcc -shared -o a.so a.o
% objdump --disassemble a.so
...snip...
0000044c <f>:
44c: 55 push %ebp
44d: 89 e5 mov %esp,%ebp
44f: e8 18 00 00 00 call 46c <__i686.get_pc_thunk.cx>
454: 81 c1 a0 1b 00 00 add $0x1ba0,%ecx
45a: 8b 81 f8 ff ff ff mov -0x8(%ecx),%eax
460: 8b 00 mov (%eax),%eax
462: 85 c0 test %eax,%eax
...snip...
Assuming that the object gets loaded at offset O in memory, the
call instruction at offset 0x44F will load O+0x454+0x1BA0, i.e.,
O+0x1FF4 into %ecx.
The instruction at offset 0x45A subtracts 8 from %ecx
to get the address of the slot for v1 in the global offset table,
i.e., the slot for v1 is at offset 0x1FEC from the start of the
shared object.
Looking at the dynamic relocation records for the shared object, we
see a relocation record instructing the runtime loader to store the
actual address for v1 at offset 0x1FEC.
% objdump -R a.so
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
...snip...
00001fec R_386_GLOB_DAT v1
...snip...
Further reading:
Pat Beirne's "Study of ELF loading and relocs" has more information about ELF relocations.

Resources