Why gcc generates a PLT when it is apparently not needed? - linux

Consider this code:
int foo();
int main() {
foo();
while(1){}
}
int foo() is implemented in a shared object.
Compiling this code with gcc -o main main.c -lfoo -nostdlib -m32 -O2 -e main --no-pic -L./shared gives the following diasm:
$ objdump -d ./main
./main: file format elf32-i386
Disassembly of section .plt:
00000240 <.plt>:
240: ff b3 04 00 00 00 pushl 0x4(%ebx)
246: ff a3 08 00 00 00 jmp *0x8(%ebx)
24c: 00 00 add %al,(%eax)
...
00000250 <foo#plt>:
250: ff a3 0c 00 00 00 jmp *0xc(%ebx)
256: 68 00 00 00 00 push $0x0
25b: e9 e0 ff ff ff jmp 240 <.plt>
Disassembly of section .text:
00000260 <main>:
260: 8d 4c 24 04 lea 0x4(%esp),%ecx
264: 83 e4 f0 and $0xfffffff0,%esp
267: ff 71 fc pushl -0x4(%ecx)
26a: 55 push %ebp
26b: 89 e5 mov %esp,%ebp
26d: 51 push %ecx
26e: 83 ec 04 sub $0x4,%esp
271: e8 fc ff ff ff call 272 <main+0x12>
276: eb fe jmp 276 <main+0x16>
With the following relocations:
$ objdump -R ./main
./main: file format elf32-i386
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
00000272 R_386_PC32 foo
00001ffc R_386_JUMP_SLOT foo
Note that:
The code was compiled with --no-pic, so it is not PIC
The call to foo(), in the .text section (main function), is not going through the PLT. Instead, it is just a simple R_386_PC32 relocation that I assume it will directly be relocated to the address of the foo function at load time. It makes sense to me since the code is not PIC, so there is no need to add an extra indirection through the PLT.
Even without being used, the PLT is still being generated. An entry for foo is present there and we even have a R_386_JUMP_SLOT relocation to set up the foo entry in the GOT at load time (which the PLT points to).
My question is simple: I don't see the PLT being used anywhere in the code and I also don't see it being necessary here, so why does gcc creates it?

--no-pic isn't like -no-pie, it seems to be a synonym for -fno-pic or -fno-pie affecting code-gen but not linking. Assuming your distro's GCC defaults to making a PIE, you are making a PIE so there's no conversion of the call to foo#plt.
I get a linker warning /tmp/ccyRsNtd.o: warning: relocation against 'getpid##GLIBC_2.0' in read-only section '.text.startup' / warning: creating DT_TEXTREL in a PIE. (But the executable does run, unlike if it were 64-bit where call rel32 isn't relocatable to the whole address space.)
And yeah, there is an unused PLT entry built by ld for some reason, but the way you're linking is totally nonstandard.
The normal reason for building a PLT is:
ld when linking a non-PIE will convert call foo into call foo#plt instead of including text relocations at every callsite that would require runtime fixups every time the program loads.
Use -fno-plt to get more efficient asm, especially for 64-bit mode where even PIE code can efficiently reference the GOT directly.
To make a simpler example, I used a function in libc (getpid) instead of a custom library. Compiling normally with gcc -fno-pie -no-pie -m32 -O2 foo.c, I get 5-byte e8 d5 ff ff ff call rel32: call 8049040 <getpid#plt>.
But adding -fno-plt to that, I get 6-byte ff 15 f4 bf 04 08 call [disp32] - call DWORD PTR ds:0x804bff4. No PLT involved, just the GOT entry referenced with an absolute address.
No runtime relocation needed; this page of the .text section can stay "clean" as a file-backed private mapping of the executable. (Runtime relocation would dirty it, making it backed only by swap space if the kernel wanted to evict that page.)
Also, it uses a "normal" GOT entry which needs early binding. This does work even with -nostdlib -lc and the ill-advised -e main instead of calling it _start like a normal person. Since it's a dynamically linked executable, the dynamic linker does run before your entry point and set up the GOT.

Related

Fails to compile [duplicate]

I have a function foo written in assembly and compiled with yasm and GCC on Linux (Ubuntu) 64-bit. It simply prints a message to stdout using puts(), here is how it looks:
bits 64
extern puts
global foo
section .data
message:
db 'foo() called', 0
section .text
foo:
push rbp
mov rbp, rsp
lea rdi, [rel message]
call puts
pop rbp
ret
It is called by a C program compiled with GCC:
extern void foo();
int main() {
foo();
return 0;
}
Build commands:
yasm -f elf64 foo_64_unix.asm
gcc -c foo_main.c -o foo_main.o
gcc foo_64_unix.o foo_main.o -o foo
./foo
Here is the problem:
When running the program it prints an error message and immediately segfaults during the call to puts:
./foo: Symbol `puts' causes overflow in R_X86_64_PC32 relocation
Segmentation fault
After disassembling with objdump I see that the call is made with the wrong address:
0000000000000660 <foo>:
660: 90 nop
661: 55 push %rbp
662: 48 89 e5 mov %rsp,%rbp
665: 48 8d 3d a4 09 20 00 lea 0x2009a4(%rip),%rdi
66c: e8 00 00 00 00 callq 671 <foo+0x11> <-- here
671: 5d pop %rbp
672: c3 retq
(671 is the address of the next instruction, not address of puts)
However, if I rewrite the same code in C the call is done differently:
645: e8 c6 fe ff ff callq 510 <puts#plt>
i.e. it references puts from the PLT.
Is it possible to tell yasm to generate similar code?
TL:DR: 3 options:
Build a non-PIE executable (gcc -no-pie -fno-pie call-lib.c libcall.o) so the linker will generate a PLT entry for you transparently when you write call puts.
call puts wrt ..plt like gcc -fPIE would do.
call [rel puts wrt ..got] like gcc -fno-plt would do.
The latter two will work in PIE executables or shared libraries. The 3rd way, wrt ..got, is slightly more efficient.
Your gcc is building PIE executables by default (32-bit absolute addresses no longer allowed in x86-64 Linux?).
I'm not sure why, but when doing so the linker doesn't automatically resolve call puts to call puts#plt. There is still a puts PLT entry generated, but the call doesn't go there.
At runtime, the dynamic linker tries to resolve puts directly to the libc symbol of that name and fixup the call rel32. But the symbol is more than +-2^31 away, so we get a warning about overflow of the R_X86_64_PC32 relocation. The low 32 bits of the target address are correct, but the upper bits aren't. (Thus your call jumps to a bad address).
Your code works for me if I build with gcc -no-pie -fno-pie call-lib.c libcall.o. The -no-pie is the critical part: it's the linker option. Your YASM command doesn't have to change.
When making a traditional position-dependent executable, the linker turns the puts symbol for the call target into puts#plt for you, because we're linking a dynamic executable (instead of statically linking libc with gcc -static -fno-pie, in which case the call could go directly to the libc function.)
Anyway, this is why gcc emits call puts#plt (GAS syntax) when compiling with -fpie (the default on your desktop, but not the default on https://godbolt.org/), but just call puts when compiling with -fno-pie.
See What does #plt mean here? for more about the PLT, and also Sorry state of dynamic libraries on Linux from a few years ago. (The modern gcc -fno-plt is like one of the ideas in that blog post.)
BTW, a more accurate/specific prototype would let gcc avoid zeroing EAX before calling foo:
extern void foo(); in C means extern void foo(...);
You could declare it as extern void foo(void);, which is what () means in C++. C++ doesn't allow function declarations that leave the args unspecified.
asm improvements
You can also put message in section .rodata (read-only data, linked as part of the text segment).
You don't need a stack frame, just something to align the stack by 16 before a call. A dummy push rax will do it.
Or we can tail-call puts by jumping to it instead of calling it, with the same stack position as on entry to this function. This works with or without PIE. Just replace call with jmp, as long as RSP is pointing at your own return address.
If you want to make PIE executables (or shared libraries), you have two options
call puts wrt ..plt - explicitly call through the PLT.
call [rel puts wrt ..got] - explicitly do an indirect call through the GOT entry, like gcc's -fno-plt style of code-gen. (Using a RIP-relative addressing mode to reach the GOT, hence the rel keyword).
WRT = With Respect To. The NASM manual documents wrt ..plt, and see also section 7.9.3: special symbols and WRT.
Normally you would use default rel at the top of your file so you can actually use call [puts wrt ..got] and still get a RIP-relative addressing mode. You can't use a 32-bit absolute addressing mode in PIE or PIC code.
call [puts wrt ..got] assembles to a memory-indirect call using the function pointer that dynamic linking stored in the GOT. (Early-binding, not lazy dynamic linking.)
NASM documents ..got for getting the address of variables in section 9.2.3. Functions in (other) libraries are identical: you get a pointer from the GOT instead of calling directly, because the offset isn't a link-time constant and might not fit in 32-bits.
YASM also accepts call [puts wrt ..GOTPCREL], like AT&T syntax call *puts#GOTPCREL(%rip), but NASM does not.
; don't use BITS 64. You *want* an error if you try to assemble this into a 32-bit .o
default rel ; RIP-relative addressing instead of 32-bit absolute by default; makes the [rel ...] optional
section .rodata ; .rodata is best for constants, not .data
message:
db 'foo() called', 0
section .text
global foo
foo:
sub rsp, 8 ; align the stack by 16
; PIE with PLT
lea rdi, [rel message] ; needed for PIE
call puts WRT ..plt ; tailcall puts
;or
; PIE with -fno-plt style code, skips the PLT indirection
lea rdi, [rel message]
call [rel puts wrt ..got]
;or
; non-PIE
mov edi, message ; more efficient, but only works in non-PIE / non-PIC
call puts ; linker will rewrite it into call puts#plt
add rsp,8 ; restore the stack, undoing the add
ret
In a position-dependent Linux executable, you can use mov edi, message instead of a RIP-relative LEA. It's smaller code-size and can run on more execution ports on most CPUs. (Fun fact: MacOS always puts the "image base" outside the low 4GiB so this optimization isn't possible there.)
In a non-PIE executable, you also might as well use call puts or jmp puts and let the linker sort it out, unless you want more efficient no-plt style dynamic linking. But if you do choose to statically link libc, I think this is the only way you'll get a direct jmp to the libc function.
(I think the possibility of static linking for non-PIE is why ld is willing to generate PLT stubs automatically for non-PIE, but not for PIE or shared libraries. It requires you to say what you mean when linking ELF shared objects.)
If you did use call puts in a PIE (call rel32), it could only work if you statically linked a position-independent implementation of puts into your PIE, so the entire thing was one executable that would get loaded at a random address at runtime (by the usual dynamic-linker mechanism), but simply didn't have a dependency on libc.so.6
Linker "relaxing" calls when the target is present at static-link time
GAS call *bar#GOTPCREL(%rip) uses R_X86_64_GOTPCRELX (relaxable)
NASM call [rel bar wrt ..got] uses R_X86_64_GOTPCREL (not relaxable)
This is less of a problem with hand-written asm; you can just use call bar when you know the symbol will be present in another .o (rather than .so) that you're going to link. But C compilers don't know the difference between library functions and other user functions you declare with prototypes (unless you use stuff like gcc -fvisibility=hidden https://gcc.gnu.org/wiki/Visibility or attributes / pragmas).
Still, you might want to write asm source that the linker can optimize if you statically link a library, but AFAIK you can't do that with NASM. You can export a symbol as hidden (visible at static-link time, but not for dynamic linking in the final .so) with global bar:function hidden, but that's in the source file defining the function, not files accessing it.
global bar
bar:
mov eax,231
syscall
call bar wrt ..plt
call [rel bar wrt ..got]
extern bar
The 2nd file, after assembling with nasm -felf64 and disassembling with objdump -drwc -Mintel to see the relocations:
0000000000000000 <.text>:
0: e8 00 00 00 00 call 0x5 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 0xb 7: R_X86_64_GOTPCREL bar-0x4
After linking with ld (GNU Binutils) 2.35.1 - ld bar.o bar2.o -o bar
0000000000401000 <_start>:
401000: e8 0b 00 00 00 call 401010 <bar>
401005: ff 15 ed 1f 00 00 call QWORD PTR [rip+0x1fed] # 402ff8 <.got>
40100b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
0000000000401010 <bar>:
401010: b8 e7 00 00 00 mov eax,0xe7
401015: 0f 05 syscall
Note that the PLT form got relaxed to just a direct call bar, PLT eliminated. But the ff 15 call [rel mem] was not relaxed to an e8 rel32
With GAS:
_start:
call bar#plt
call *bar#GOTPCREL(%rip)
gcc -c foo.s && disas foo.o
0000000000000000 <_start>:
0: e8 00 00 00 00 call 5 <_start+0x5> 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # b <_start+0xb> 7: R_X86_64_GOTPCRELX bar-0x4
Note the X at the end of R_X86_64_GOTPCRELX.
ld bar2.o foo.o -o bar && disas bar:
0000000000401000 <bar>:
401000: b8 e7 00 00 00 mov eax,0xe7
401005: 0f 05 syscall
0000000000401007 <_start>:
401007: e8 f4 ff ff ff call 401000 <bar>
40100c: 67 e8 ee ff ff ff addr32 call 401000 <bar>
Both calls got relaxed to a direct e8 call rel32 straight to the target address. The extra byte in indirect call is filled with a 67 address-size prefix (which has no effect on call rel32), padding the instruction to the same length. (Because it's too late to re-assemble and re-compute all relative branches within functions, and alignment and so on.)
That would happen for call *puts#GOTPCREL(%rip) if you statically linked libc, with gcc -static.
The 0xe8 opcode is followed by a signed offset to be applied to the PC (which has advanced to the next instruction by that time) to compute the branch target. Hence objdump is interpreting the branch target as 0x671.
YASM is rendering zeros because it has likely put a relocation on that offset, which is how it asks the loader to populate the correct offset for puts during loading. The loader is encountering an overflow when computing the relocation, which may indicate that puts is at a further offset from your call than can be represented in a 32-bit signed offset. Hence the loader fails to fix this instruction, and you get a crash.
66c: e8 00 00 00 00 shows the unpopulated address. If you look in your relocation table, you should see a relocation on 0x66d. It is not uncommon for the assembler to populate addresses/offsets with relocations as all zeros.
This page suggests that YASM has a WRT directive that can control use of .got, .plt, etc.
Per S9.2.5 on the NASM documentation, it looks like you can use CALL puts WRT ..plt (presuming YASM has the same syntax).

How to use ld to make a dynamic library? [duplicate]

I have a function foo written in assembly and compiled with yasm and GCC on Linux (Ubuntu) 64-bit. It simply prints a message to stdout using puts(), here is how it looks:
bits 64
extern puts
global foo
section .data
message:
db 'foo() called', 0
section .text
foo:
push rbp
mov rbp, rsp
lea rdi, [rel message]
call puts
pop rbp
ret
It is called by a C program compiled with GCC:
extern void foo();
int main() {
foo();
return 0;
}
Build commands:
yasm -f elf64 foo_64_unix.asm
gcc -c foo_main.c -o foo_main.o
gcc foo_64_unix.o foo_main.o -o foo
./foo
Here is the problem:
When running the program it prints an error message and immediately segfaults during the call to puts:
./foo: Symbol `puts' causes overflow in R_X86_64_PC32 relocation
Segmentation fault
After disassembling with objdump I see that the call is made with the wrong address:
0000000000000660 <foo>:
660: 90 nop
661: 55 push %rbp
662: 48 89 e5 mov %rsp,%rbp
665: 48 8d 3d a4 09 20 00 lea 0x2009a4(%rip),%rdi
66c: e8 00 00 00 00 callq 671 <foo+0x11> <-- here
671: 5d pop %rbp
672: c3 retq
(671 is the address of the next instruction, not address of puts)
However, if I rewrite the same code in C the call is done differently:
645: e8 c6 fe ff ff callq 510 <puts#plt>
i.e. it references puts from the PLT.
Is it possible to tell yasm to generate similar code?
TL:DR: 3 options:
Build a non-PIE executable (gcc -no-pie -fno-pie call-lib.c libcall.o) so the linker will generate a PLT entry for you transparently when you write call puts.
call puts wrt ..plt like gcc -fPIE would do.
call [rel puts wrt ..got] like gcc -fno-plt would do.
The latter two will work in PIE executables or shared libraries. The 3rd way, wrt ..got, is slightly more efficient.
Your gcc is building PIE executables by default (32-bit absolute addresses no longer allowed in x86-64 Linux?).
I'm not sure why, but when doing so the linker doesn't automatically resolve call puts to call puts#plt. There is still a puts PLT entry generated, but the call doesn't go there.
At runtime, the dynamic linker tries to resolve puts directly to the libc symbol of that name and fixup the call rel32. But the symbol is more than +-2^31 away, so we get a warning about overflow of the R_X86_64_PC32 relocation. The low 32 bits of the target address are correct, but the upper bits aren't. (Thus your call jumps to a bad address).
Your code works for me if I build with gcc -no-pie -fno-pie call-lib.c libcall.o. The -no-pie is the critical part: it's the linker option. Your YASM command doesn't have to change.
When making a traditional position-dependent executable, the linker turns the puts symbol for the call target into puts#plt for you, because we're linking a dynamic executable (instead of statically linking libc with gcc -static -fno-pie, in which case the call could go directly to the libc function.)
Anyway, this is why gcc emits call puts#plt (GAS syntax) when compiling with -fpie (the default on your desktop, but not the default on https://godbolt.org/), but just call puts when compiling with -fno-pie.
See What does #plt mean here? for more about the PLT, and also Sorry state of dynamic libraries on Linux from a few years ago. (The modern gcc -fno-plt is like one of the ideas in that blog post.)
BTW, a more accurate/specific prototype would let gcc avoid zeroing EAX before calling foo:
extern void foo(); in C means extern void foo(...);
You could declare it as extern void foo(void);, which is what () means in C++. C++ doesn't allow function declarations that leave the args unspecified.
asm improvements
You can also put message in section .rodata (read-only data, linked as part of the text segment).
You don't need a stack frame, just something to align the stack by 16 before a call. A dummy push rax will do it.
Or we can tail-call puts by jumping to it instead of calling it, with the same stack position as on entry to this function. This works with or without PIE. Just replace call with jmp, as long as RSP is pointing at your own return address.
If you want to make PIE executables (or shared libraries), you have two options
call puts wrt ..plt - explicitly call through the PLT.
call [rel puts wrt ..got] - explicitly do an indirect call through the GOT entry, like gcc's -fno-plt style of code-gen. (Using a RIP-relative addressing mode to reach the GOT, hence the rel keyword).
WRT = With Respect To. The NASM manual documents wrt ..plt, and see also section 7.9.3: special symbols and WRT.
Normally you would use default rel at the top of your file so you can actually use call [puts wrt ..got] and still get a RIP-relative addressing mode. You can't use a 32-bit absolute addressing mode in PIE or PIC code.
call [puts wrt ..got] assembles to a memory-indirect call using the function pointer that dynamic linking stored in the GOT. (Early-binding, not lazy dynamic linking.)
NASM documents ..got for getting the address of variables in section 9.2.3. Functions in (other) libraries are identical: you get a pointer from the GOT instead of calling directly, because the offset isn't a link-time constant and might not fit in 32-bits.
YASM also accepts call [puts wrt ..GOTPCREL], like AT&T syntax call *puts#GOTPCREL(%rip), but NASM does not.
; don't use BITS 64. You *want* an error if you try to assemble this into a 32-bit .o
default rel ; RIP-relative addressing instead of 32-bit absolute by default; makes the [rel ...] optional
section .rodata ; .rodata is best for constants, not .data
message:
db 'foo() called', 0
section .text
global foo
foo:
sub rsp, 8 ; align the stack by 16
; PIE with PLT
lea rdi, [rel message] ; needed for PIE
call puts WRT ..plt ; tailcall puts
;or
; PIE with -fno-plt style code, skips the PLT indirection
lea rdi, [rel message]
call [rel puts wrt ..got]
;or
; non-PIE
mov edi, message ; more efficient, but only works in non-PIE / non-PIC
call puts ; linker will rewrite it into call puts#plt
add rsp,8 ; restore the stack, undoing the add
ret
In a position-dependent Linux executable, you can use mov edi, message instead of a RIP-relative LEA. It's smaller code-size and can run on more execution ports on most CPUs. (Fun fact: MacOS always puts the "image base" outside the low 4GiB so this optimization isn't possible there.)
In a non-PIE executable, you also might as well use call puts or jmp puts and let the linker sort it out, unless you want more efficient no-plt style dynamic linking. But if you do choose to statically link libc, I think this is the only way you'll get a direct jmp to the libc function.
(I think the possibility of static linking for non-PIE is why ld is willing to generate PLT stubs automatically for non-PIE, but not for PIE or shared libraries. It requires you to say what you mean when linking ELF shared objects.)
If you did use call puts in a PIE (call rel32), it could only work if you statically linked a position-independent implementation of puts into your PIE, so the entire thing was one executable that would get loaded at a random address at runtime (by the usual dynamic-linker mechanism), but simply didn't have a dependency on libc.so.6
Linker "relaxing" calls when the target is present at static-link time
GAS call *bar#GOTPCREL(%rip) uses R_X86_64_GOTPCRELX (relaxable)
NASM call [rel bar wrt ..got] uses R_X86_64_GOTPCREL (not relaxable)
This is less of a problem with hand-written asm; you can just use call bar when you know the symbol will be present in another .o (rather than .so) that you're going to link. But C compilers don't know the difference between library functions and other user functions you declare with prototypes (unless you use stuff like gcc -fvisibility=hidden https://gcc.gnu.org/wiki/Visibility or attributes / pragmas).
Still, you might want to write asm source that the linker can optimize if you statically link a library, but AFAIK you can't do that with NASM. You can export a symbol as hidden (visible at static-link time, but not for dynamic linking in the final .so) with global bar:function hidden, but that's in the source file defining the function, not files accessing it.
global bar
bar:
mov eax,231
syscall
call bar wrt ..plt
call [rel bar wrt ..got]
extern bar
The 2nd file, after assembling with nasm -felf64 and disassembling with objdump -drwc -Mintel to see the relocations:
0000000000000000 <.text>:
0: e8 00 00 00 00 call 0x5 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 0xb 7: R_X86_64_GOTPCREL bar-0x4
After linking with ld (GNU Binutils) 2.35.1 - ld bar.o bar2.o -o bar
0000000000401000 <_start>:
401000: e8 0b 00 00 00 call 401010 <bar>
401005: ff 15 ed 1f 00 00 call QWORD PTR [rip+0x1fed] # 402ff8 <.got>
40100b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
0000000000401010 <bar>:
401010: b8 e7 00 00 00 mov eax,0xe7
401015: 0f 05 syscall
Note that the PLT form got relaxed to just a direct call bar, PLT eliminated. But the ff 15 call [rel mem] was not relaxed to an e8 rel32
With GAS:
_start:
call bar#plt
call *bar#GOTPCREL(%rip)
gcc -c foo.s && disas foo.o
0000000000000000 <_start>:
0: e8 00 00 00 00 call 5 <_start+0x5> 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # b <_start+0xb> 7: R_X86_64_GOTPCRELX bar-0x4
Note the X at the end of R_X86_64_GOTPCRELX.
ld bar2.o foo.o -o bar && disas bar:
0000000000401000 <bar>:
401000: b8 e7 00 00 00 mov eax,0xe7
401005: 0f 05 syscall
0000000000401007 <_start>:
401007: e8 f4 ff ff ff call 401000 <bar>
40100c: 67 e8 ee ff ff ff addr32 call 401000 <bar>
Both calls got relaxed to a direct e8 call rel32 straight to the target address. The extra byte in indirect call is filled with a 67 address-size prefix (which has no effect on call rel32), padding the instruction to the same length. (Because it's too late to re-assemble and re-compute all relative branches within functions, and alignment and so on.)
That would happen for call *puts#GOTPCREL(%rip) if you statically linked libc, with gcc -static.
The 0xe8 opcode is followed by a signed offset to be applied to the PC (which has advanced to the next instruction by that time) to compute the branch target. Hence objdump is interpreting the branch target as 0x671.
YASM is rendering zeros because it has likely put a relocation on that offset, which is how it asks the loader to populate the correct offset for puts during loading. The loader is encountering an overflow when computing the relocation, which may indicate that puts is at a further offset from your call than can be represented in a 32-bit signed offset. Hence the loader fails to fix this instruction, and you get a crash.
66c: e8 00 00 00 00 shows the unpopulated address. If you look in your relocation table, you should see a relocation on 0x66d. It is not uncommon for the assembler to populate addresses/offsets with relocations as all zeros.
This page suggests that YASM has a WRT directive that can control use of .got, .plt, etc.
Per S9.2.5 on the NASM documentation, it looks like you can use CALL puts WRT ..plt (presuming YASM has the same syntax).

How to access a C global variable through GOT in GAS assembly on x86-64 Linux?

My problem
I am trying to write a shared library(not an executable, so please do not tell me to use -no-pie) with assembly and C in separate files(not inline assembly).
And I would like to access a C global variable through Global Offset Table in assembly code, because the function called might be defined in any other shared libraries.
I know the PLT/GOT stuff but I do not know for sure how to tell the compiler to correctly generate relocation information for the linker(what is the syntax), and how to tell the linker to actually relocate my code with that information(what is the linker options).
My code compiles with a linking error
/bin/ld: tracer.o: relocation R_X86_64_PC32 against
/bin/ld: final link failed: bad value
Furthermore, it would be better if someone could share some detailed documentation on the GAS assembly about relocation. For example, an exhaustive list on how to interpolate between C and assembly with GNU assembler.
Source Code
Compile the C and assembly code and link the into ONE shared library.
# Makefile
liba.so: tracer2.S target2.c
gcc -shared -g -o liba.so tracer2.S target2.c
// target2.c
// NOTE: This is a variable, not a function.
int (*read_original)(int fd, void *data, unsigned long size) = 0;
// tracer2.S
.text
// external symbol declarition
.global read_original
read:
lea read_original(%rip), %rax
mov (%rax), %rax
jmp *%rax
Expectation and Result
I expect the linker to happily link my object files but it says
g++ -shared -g -o liba.so tracer2.o target2.c -ldl
/bin/ld: tracer.o: relocation R_X86_64_PC32 against
/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make: *** [Makefile:2: liba.so] Error 1
and commenting out the line
// lea read_original(%rip), %rax
makes the error disappear.
Solution.
lea read_original#GOTPCREL(%rip), %rax
The keyword GOTPCREL will tell the compiler this is a PC-relative relocation to GOT table. The linker will calculate the offset from current rip to the target GOT table entry.
You can verify with
$ objdump -d liba.so
10e9: 48 8d 05 f8 2e 00 00 lea 0x2ef8(%rip),%rax # 3fe8 <read_original##Base-0x40>
10f0: 48 8b 00 mov (%rax),%rax
10f3: ff e0 jmpq *%rax
Thanks to Peter.
Some information that might be related or not
1. I can call a C function with
call read#plt
objdump shows it calls into the correct PLT entry.
$ objdump -d liba.so
...
0000000000001109 <read1>:
1109: e8 22 ff ff ff callq 1030 <read#plt>
110e: ff e0 jmpq *%rax
2. I can lea a PLT entry address correctly
0xffffff23 is -0xdd, 0x1109 - 0xdd = 102c
0000000000001020 <.plt>:
1020: ff 35 e2 2f 00 00 pushq 0x2fe2(%rip) # 4008 <_GLOBAL_OFFSET_TABLE_+0x8>
1026: ff 25 e4 2f 00 00 jmpq *0x2fe4(%rip) # 4010 <_GLOBAL_OFFSET_TABLE_+0x10>
102c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000001030 <read#plt>:
1030: ff 25 e2 2f 00 00 jmpq *0x2fe2(%rip) # 4018 <read#GLIBC_2.2.5>
1036: 68 00 00 00 00 pushq $0x0
103b: e9 e0 ff ff ff jmpq 1020 <.plt>
0000000000001109 <read1>:
1109: 48 8d 04 25 23 ff ff lea 0xffffffffffffff23,%rax
1110: ff
1111: ff e0 jmpq *%rax
Environment
Arch Linux 20190809
$ uname -a
Linux alex-arch 5.2.6-arch1-1-ARCH #1 SMP PREEMPT Sun Aug 4 14:58:49 UTC 2019 x86_64 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto
Thread model: posix
gcc version 9.1.0 (GCC)
$ ld --version
GNU ld (GNU Binutils) 2.32
Copyright (C) 2019 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
Apparently the linker enforces global vs. hidden visibility for symbols in ELF shared objects, not allowing "back door" access to symbols that participate in symbol-interposition (and thus can potentially be more than 2GB away.)
To access it directly from other code in the same shared object with normal RIP-relative addressing, make the symbol hidden by setting its ELF visibility as such. (See also https://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/ and Ulrich Drepper's How to Write Shared Libraries)
__attribute__ ((visibility("hidden")))
int (*read_original)(int fd, void *data, unsigned long size) = 0;
Then gcc -save-temps tracer2.S target2.c -shared -fPIC compiles/assembles + links a shared library. GCC also has options like -fvisibility=hidden that makes that the default, requiring explicit attributes on symbols you do want to export for dynamic linking. That's a very good idea if you have any globals that you use inside your library, to get the compiler to emit efficient code for using them. It also protects you from global name-clashes with other libraries. The GCC manuals strongly recommends it.
It also works with g++; C++ name mangling only applies to function names, not variables (including function-pointers). But generally don't compile .c files with a C++ compiler.
If you do want to support symbol interposition, you need to use the GOT; obviously you can just look at how the compiler does it:
int glob; // with default visibility = default
int foo() { return glob; }
compiles to this asm with GCC -O3 -fPIC (without any visibility options, so global symbols are fully globally visible: exported from shared objects and participating in symbol interposition).
foo:
movq glob#GOTPCREL(%rip), %rax
movl (%rax), %eax
ret
Obviously this is less efficient than mov glob(%rip), %eax so prefer keeping your global vars scoped to the library (hidden), not truly global.
There are tricks you can do with weak aliases to let you export a symbol that only this library defines, and access that definition efficiently via a "hidden" alias.

Can't call C standard library function on 64-bit Linux from assembly (yasm) code

I have a function foo written in assembly and compiled with yasm and GCC on Linux (Ubuntu) 64-bit. It simply prints a message to stdout using puts(), here is how it looks:
bits 64
extern puts
global foo
section .data
message:
db 'foo() called', 0
section .text
foo:
push rbp
mov rbp, rsp
lea rdi, [rel message]
call puts
pop rbp
ret
It is called by a C program compiled with GCC:
extern void foo();
int main() {
foo();
return 0;
}
Build commands:
yasm -f elf64 foo_64_unix.asm
gcc -c foo_main.c -o foo_main.o
gcc foo_64_unix.o foo_main.o -o foo
./foo
Here is the problem:
When running the program it prints an error message and immediately segfaults during the call to puts:
./foo: Symbol `puts' causes overflow in R_X86_64_PC32 relocation
Segmentation fault
After disassembling with objdump I see that the call is made with the wrong address:
0000000000000660 <foo>:
660: 90 nop
661: 55 push %rbp
662: 48 89 e5 mov %rsp,%rbp
665: 48 8d 3d a4 09 20 00 lea 0x2009a4(%rip),%rdi
66c: e8 00 00 00 00 callq 671 <foo+0x11> <-- here
671: 5d pop %rbp
672: c3 retq
(671 is the address of the next instruction, not address of puts)
However, if I rewrite the same code in C the call is done differently:
645: e8 c6 fe ff ff callq 510 <puts#plt>
i.e. it references puts from the PLT.
Is it possible to tell yasm to generate similar code?
TL:DR: 3 options:
Build a non-PIE executable (gcc -no-pie -fno-pie call-lib.c libcall.o) so the linker will generate a PLT entry for you transparently when you write call puts.
call puts wrt ..plt like gcc -fPIE would do.
call [rel puts wrt ..got] like gcc -fno-plt would do.
The latter two will work in PIE executables or shared libraries. The 3rd way, wrt ..got, is slightly more efficient.
Your gcc is building PIE executables by default (32-bit absolute addresses no longer allowed in x86-64 Linux?).
I'm not sure why, but when doing so the linker doesn't automatically resolve call puts to call puts#plt. There is still a puts PLT entry generated, but the call doesn't go there.
At runtime, the dynamic linker tries to resolve puts directly to the libc symbol of that name and fixup the call rel32. But the symbol is more than +-2^31 away, so we get a warning about overflow of the R_X86_64_PC32 relocation. The low 32 bits of the target address are correct, but the upper bits aren't. (Thus your call jumps to a bad address).
Your code works for me if I build with gcc -no-pie -fno-pie call-lib.c libcall.o. The -no-pie is the critical part: it's the linker option. Your YASM command doesn't have to change.
When making a traditional position-dependent executable, the linker turns the puts symbol for the call target into puts#plt for you, because we're linking a dynamic executable (instead of statically linking libc with gcc -static -fno-pie, in which case the call could go directly to the libc function.)
Anyway, this is why gcc emits call puts#plt (GAS syntax) when compiling with -fpie (the default on your desktop, but not the default on https://godbolt.org/), but just call puts when compiling with -fno-pie.
See What does #plt mean here? for more about the PLT, and also Sorry state of dynamic libraries on Linux from a few years ago. (The modern gcc -fno-plt is like one of the ideas in that blog post.)
BTW, a more accurate/specific prototype would let gcc avoid zeroing EAX before calling foo:
extern void foo(); in C means extern void foo(...);
You could declare it as extern void foo(void);, which is what () means in C++. C++ doesn't allow function declarations that leave the args unspecified.
asm improvements
You can also put message in section .rodata (read-only data, linked as part of the text segment).
You don't need a stack frame, just something to align the stack by 16 before a call. A dummy push rax will do it.
Or we can tail-call puts by jumping to it instead of calling it, with the same stack position as on entry to this function. This works with or without PIE. Just replace call with jmp, as long as RSP is pointing at your own return address.
If you want to make PIE executables (or shared libraries), you have two options
call puts wrt ..plt - explicitly call through the PLT.
call [rel puts wrt ..got] - explicitly do an indirect call through the GOT entry, like gcc's -fno-plt style of code-gen. (Using a RIP-relative addressing mode to reach the GOT, hence the rel keyword).
WRT = With Respect To. The NASM manual documents wrt ..plt, and see also section 7.9.3: special symbols and WRT.
Normally you would use default rel at the top of your file so you can actually use call [puts wrt ..got] and still get a RIP-relative addressing mode. You can't use a 32-bit absolute addressing mode in PIE or PIC code.
call [puts wrt ..got] assembles to a memory-indirect call using the function pointer that dynamic linking stored in the GOT. (Early-binding, not lazy dynamic linking.)
NASM documents ..got for getting the address of variables in section 9.2.3. Functions in (other) libraries are identical: you get a pointer from the GOT instead of calling directly, because the offset isn't a link-time constant and might not fit in 32-bits.
YASM also accepts call [puts wrt ..GOTPCREL], like AT&T syntax call *puts#GOTPCREL(%rip), but NASM does not.
; don't use BITS 64. You *want* an error if you try to assemble this into a 32-bit .o
default rel ; RIP-relative addressing instead of 32-bit absolute by default; makes the [rel ...] optional
section .rodata ; .rodata is best for constants, not .data
message:
db 'foo() called', 0
section .text
global foo
foo:
sub rsp, 8 ; align the stack by 16
; PIE with PLT
lea rdi, [rel message] ; needed for PIE
call puts WRT ..plt ; tailcall puts
;or
; PIE with -fno-plt style code, skips the PLT indirection
lea rdi, [rel message]
call [rel puts wrt ..got]
;or
; non-PIE
mov edi, message ; more efficient, but only works in non-PIE / non-PIC
call puts ; linker will rewrite it into call puts#plt
add rsp,8 ; restore the stack, undoing the add
ret
In a position-dependent Linux executable, you can use mov edi, message instead of a RIP-relative LEA. It's smaller code-size and can run on more execution ports on most CPUs. (Fun fact: MacOS always puts the "image base" outside the low 4GiB so this optimization isn't possible there.)
In a non-PIE executable, you also might as well use call puts or jmp puts and let the linker sort it out, unless you want more efficient no-plt style dynamic linking. But if you do choose to statically link libc, I think this is the only way you'll get a direct jmp to the libc function.
(I think the possibility of static linking for non-PIE is why ld is willing to generate PLT stubs automatically for non-PIE, but not for PIE or shared libraries. It requires you to say what you mean when linking ELF shared objects.)
If you did use call puts in a PIE (call rel32), it could only work if you statically linked a position-independent implementation of puts into your PIE, so the entire thing was one executable that would get loaded at a random address at runtime (by the usual dynamic-linker mechanism), but simply didn't have a dependency on libc.so.6
Linker "relaxing" calls when the target is present at static-link time
GAS call *bar#GOTPCREL(%rip) uses R_X86_64_GOTPCRELX (relaxable)
NASM call [rel bar wrt ..got] uses R_X86_64_GOTPCREL (not relaxable)
This is less of a problem with hand-written asm; you can just use call bar when you know the symbol will be present in another .o (rather than .so) that you're going to link. But C compilers don't know the difference between library functions and other user functions you declare with prototypes (unless you use stuff like gcc -fvisibility=hidden https://gcc.gnu.org/wiki/Visibility or attributes / pragmas).
Still, you might want to write asm source that the linker can optimize if you statically link a library, but AFAIK you can't do that with NASM. You can export a symbol as hidden (visible at static-link time, but not for dynamic linking in the final .so) with global bar:function hidden, but that's in the source file defining the function, not files accessing it.
global bar
bar:
mov eax,231
syscall
call bar wrt ..plt
call [rel bar wrt ..got]
extern bar
The 2nd file, after assembling with nasm -felf64 and disassembling with objdump -drwc -Mintel to see the relocations:
0000000000000000 <.text>:
0: e8 00 00 00 00 call 0x5 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 0xb 7: R_X86_64_GOTPCREL bar-0x4
After linking with ld (GNU Binutils) 2.35.1 - ld bar.o bar2.o -o bar
0000000000401000 <_start>:
401000: e8 0b 00 00 00 call 401010 <bar>
401005: ff 15 ed 1f 00 00 call QWORD PTR [rip+0x1fed] # 402ff8 <.got>
40100b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
0000000000401010 <bar>:
401010: b8 e7 00 00 00 mov eax,0xe7
401015: 0f 05 syscall
Note that the PLT form got relaxed to just a direct call bar, PLT eliminated. But the ff 15 call [rel mem] was not relaxed to an e8 rel32
With GAS:
_start:
call bar#plt
call *bar#GOTPCREL(%rip)
gcc -c foo.s && disas foo.o
0000000000000000 <_start>:
0: e8 00 00 00 00 call 5 <_start+0x5> 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # b <_start+0xb> 7: R_X86_64_GOTPCRELX bar-0x4
Note the X at the end of R_X86_64_GOTPCRELX.
ld bar2.o foo.o -o bar && disas bar:
0000000000401000 <bar>:
401000: b8 e7 00 00 00 mov eax,0xe7
401005: 0f 05 syscall
0000000000401007 <_start>:
401007: e8 f4 ff ff ff call 401000 <bar>
40100c: 67 e8 ee ff ff ff addr32 call 401000 <bar>
Both calls got relaxed to a direct e8 call rel32 straight to the target address. The extra byte in indirect call is filled with a 67 address-size prefix (which has no effect on call rel32), padding the instruction to the same length. (Because it's too late to re-assemble and re-compute all relative branches within functions, and alignment and so on.)
That would happen for call *puts#GOTPCREL(%rip) if you statically linked libc, with gcc -static.
The 0xe8 opcode is followed by a signed offset to be applied to the PC (which has advanced to the next instruction by that time) to compute the branch target. Hence objdump is interpreting the branch target as 0x671.
YASM is rendering zeros because it has likely put a relocation on that offset, which is how it asks the loader to populate the correct offset for puts during loading. The loader is encountering an overflow when computing the relocation, which may indicate that puts is at a further offset from your call than can be represented in a 32-bit signed offset. Hence the loader fails to fix this instruction, and you get a crash.
66c: e8 00 00 00 00 shows the unpopulated address. If you look in your relocation table, you should see a relocation on 0x66d. It is not uncommon for the assembler to populate addresses/offsets with relocations as all zeros.
This page suggests that YASM has a WRT directive that can control use of .got, .plt, etc.
Per S9.2.5 on the NASM documentation, it looks like you can use CALL puts WRT ..plt (presuming YASM has the same syntax).

Kernel exploit shellcode

I want to write shellcode for kernel mode on 32-bit Linux that will do this:
commit_creds (prepare_kernel_cred(0));
So I create a file with:
xor eax, eax
call 0x1234567
call 0x1234568
ret
Where 0x1234567 is the address of prepare_kernel_cred and 0x1234568 is the address of commit_creds, both found from /proc/kallsyms.
I assemble it with nasm -f elf and objdump -d it to get the machine code.
I get something like:
31 c0 which is xor eax, eax
e8 7c 67 06 c1 which is call prepare_kernel_cred
e8 7c 65 06 c1 which is call commit_creds
c3 which is ret
This doesn't work. However, using e8 79 instead of e8 7c and e8 74 instead of the second e8 7c, works. I don't remember where I got this second machine code from (I had it in a different file), but I'm very curious why this would work and not simply assembling it like that would work.
What type of CALL is this? Why doesn't it work to simply assemble the code as it is shown above? My toy exploit works fine against my artificial kernel bug if I use the e8 79 and e8 74 for the CALLs, but fails when I use the assembled machine code from nasm/objdump.
The CALL variants beginning with E8h are near calls to an address specified by a displacement relative to the current instruction. This explains why the values need to be different for different instructions. I'm at a loss for how you got nasm to emit that code, though. Are you sure this isn't homework?
I found that I had used the following command to compile this before:
gcc -m32 -Ttext=0 -nostdlib
This gives me the same result as I had from before. I also get a warning that it defaults to starting from 0x0.
However why doesn't nasm reproduce this? I checked with objdump and the starting address seems to be 0x0 in both files.

Resources