Consider a file with only the simple 32-bit x86 assembly statement:
call 0xc1066580
If I assemble this file with nasm -f elf I get:
0: e8 7c 65 06 c1 call 0xc1066581
If I use GCC and specify -Ttext=0 and -nostdlib I get:
0: e8 7b 65 06 c1 call c1066580
-nostdlib
Do not use the standard system startup files or libraries when linking. No startup files and only the libraries you specify are passed to the linker, and options specifying linkage of the system libraries, such as -static-libgcc or -shared-libgcc, are ignored.
But what exactly does -Ttext=0 do? I use it to specify the entry address the EIP starts at when it is loaded/executed. I'm unable to find -Ttext in the manpages, when I search online I found this:
"-Ttext is an alias for "--section-start=text", which reads as:
--section-start=sectionname=org
Locate a section in the output file at the absolute address given
by org. You may use this option as many times as necessary to
locate multiple sections in the command line. org must be a single
hexadecimal integer; for compatibility with other linkers, you may
omit the leading 0x usually associated with hexadecimal values.
Note: there should be no white space between sectionname, the
equals sign ("="), and org."
From http://www.linuxquestions.org/questions/linux-general-1/gcc-creating-a-huge-executable-image-redhat-2-6-18-8-el5-x86_64-linux-759302/
However, I don't find --section or sectionname in my manpage either, and when I try to replace -Ttext with --section-name I get that this is an unrecognized argument (this is GCC 4.7.2 if it is relevant).
Could someone tell me if this explanation (of -Ttext) is accurate and where I can find it in my manual? If it is not accurate, what does -Ttext really do?
My other question is: How does one specify a similar argument as -Ttext to nasm? Or in other words, what do I need to do to make nasm produce the same output as gcc does?
I tried to execute the same assemble statements (with nasm and gcc) on both a 64-bit and 32-bit system, I get the same results.
Running ld --help gives
-Ttext ADDRESS Set address of .text section
If we assemble the following program using gcc -Ttext=8 -nostdlib -o test test.s
.globl _start
_start:
movl test,%ebx
test:
And dump the section headers (objdump -h test):
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000007 0000000000000008 0000000000000008 00200008 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
..and the code (objdump -d test):
0000000000000008 <_start>:
8: 8b 1c 25 0f 00 00 00 mov 0xf,%ebx
We can see that the .text section has a starting address of 8 and a size of 7. That is, all references to symbols within the section have been offset by the starting address we specified (8), but there was no padding involved (the section size did not grow as a result of having changed its address).
You should be able to accomplish the same thing with NASM by using the ORG directive: "NASM's ORG does exactly what the directive says: origin. Its sole function is to specify one offset which is added to all internal address references within the section".
Related
I have a function foo written in assembly and compiled with yasm and GCC on Linux (Ubuntu) 64-bit. It simply prints a message to stdout using puts(), here is how it looks:
bits 64
extern puts
global foo
section .data
message:
db 'foo() called', 0
section .text
foo:
push rbp
mov rbp, rsp
lea rdi, [rel message]
call puts
pop rbp
ret
It is called by a C program compiled with GCC:
extern void foo();
int main() {
foo();
return 0;
}
Build commands:
yasm -f elf64 foo_64_unix.asm
gcc -c foo_main.c -o foo_main.o
gcc foo_64_unix.o foo_main.o -o foo
./foo
Here is the problem:
When running the program it prints an error message and immediately segfaults during the call to puts:
./foo: Symbol `puts' causes overflow in R_X86_64_PC32 relocation
Segmentation fault
After disassembling with objdump I see that the call is made with the wrong address:
0000000000000660 <foo>:
660: 90 nop
661: 55 push %rbp
662: 48 89 e5 mov %rsp,%rbp
665: 48 8d 3d a4 09 20 00 lea 0x2009a4(%rip),%rdi
66c: e8 00 00 00 00 callq 671 <foo+0x11> <-- here
671: 5d pop %rbp
672: c3 retq
(671 is the address of the next instruction, not address of puts)
However, if I rewrite the same code in C the call is done differently:
645: e8 c6 fe ff ff callq 510 <puts#plt>
i.e. it references puts from the PLT.
Is it possible to tell yasm to generate similar code?
TL:DR: 3 options:
Build a non-PIE executable (gcc -no-pie -fno-pie call-lib.c libcall.o) so the linker will generate a PLT entry for you transparently when you write call puts.
call puts wrt ..plt like gcc -fPIE would do.
call [rel puts wrt ..got] like gcc -fno-plt would do.
The latter two will work in PIE executables or shared libraries. The 3rd way, wrt ..got, is slightly more efficient.
Your gcc is building PIE executables by default (32-bit absolute addresses no longer allowed in x86-64 Linux?).
I'm not sure why, but when doing so the linker doesn't automatically resolve call puts to call puts#plt. There is still a puts PLT entry generated, but the call doesn't go there.
At runtime, the dynamic linker tries to resolve puts directly to the libc symbol of that name and fixup the call rel32. But the symbol is more than +-2^31 away, so we get a warning about overflow of the R_X86_64_PC32 relocation. The low 32 bits of the target address are correct, but the upper bits aren't. (Thus your call jumps to a bad address).
Your code works for me if I build with gcc -no-pie -fno-pie call-lib.c libcall.o. The -no-pie is the critical part: it's the linker option. Your YASM command doesn't have to change.
When making a traditional position-dependent executable, the linker turns the puts symbol for the call target into puts#plt for you, because we're linking a dynamic executable (instead of statically linking libc with gcc -static -fno-pie, in which case the call could go directly to the libc function.)
Anyway, this is why gcc emits call puts#plt (GAS syntax) when compiling with -fpie (the default on your desktop, but not the default on https://godbolt.org/), but just call puts when compiling with -fno-pie.
See What does #plt mean here? for more about the PLT, and also Sorry state of dynamic libraries on Linux from a few years ago. (The modern gcc -fno-plt is like one of the ideas in that blog post.)
BTW, a more accurate/specific prototype would let gcc avoid zeroing EAX before calling foo:
extern void foo(); in C means extern void foo(...);
You could declare it as extern void foo(void);, which is what () means in C++. C++ doesn't allow function declarations that leave the args unspecified.
asm improvements
You can also put message in section .rodata (read-only data, linked as part of the text segment).
You don't need a stack frame, just something to align the stack by 16 before a call. A dummy push rax will do it.
Or we can tail-call puts by jumping to it instead of calling it, with the same stack position as on entry to this function. This works with or without PIE. Just replace call with jmp, as long as RSP is pointing at your own return address.
If you want to make PIE executables (or shared libraries), you have two options
call puts wrt ..plt - explicitly call through the PLT.
call [rel puts wrt ..got] - explicitly do an indirect call through the GOT entry, like gcc's -fno-plt style of code-gen. (Using a RIP-relative addressing mode to reach the GOT, hence the rel keyword).
WRT = With Respect To. The NASM manual documents wrt ..plt, and see also section 7.9.3: special symbols and WRT.
Normally you would use default rel at the top of your file so you can actually use call [puts wrt ..got] and still get a RIP-relative addressing mode. You can't use a 32-bit absolute addressing mode in PIE or PIC code.
call [puts wrt ..got] assembles to a memory-indirect call using the function pointer that dynamic linking stored in the GOT. (Early-binding, not lazy dynamic linking.)
NASM documents ..got for getting the address of variables in section 9.2.3. Functions in (other) libraries are identical: you get a pointer from the GOT instead of calling directly, because the offset isn't a link-time constant and might not fit in 32-bits.
YASM also accepts call [puts wrt ..GOTPCREL], like AT&T syntax call *puts#GOTPCREL(%rip), but NASM does not.
; don't use BITS 64. You *want* an error if you try to assemble this into a 32-bit .o
default rel ; RIP-relative addressing instead of 32-bit absolute by default; makes the [rel ...] optional
section .rodata ; .rodata is best for constants, not .data
message:
db 'foo() called', 0
section .text
global foo
foo:
sub rsp, 8 ; align the stack by 16
; PIE with PLT
lea rdi, [rel message] ; needed for PIE
call puts WRT ..plt ; tailcall puts
;or
; PIE with -fno-plt style code, skips the PLT indirection
lea rdi, [rel message]
call [rel puts wrt ..got]
;or
; non-PIE
mov edi, message ; more efficient, but only works in non-PIE / non-PIC
call puts ; linker will rewrite it into call puts#plt
add rsp,8 ; restore the stack, undoing the add
ret
In a position-dependent Linux executable, you can use mov edi, message instead of a RIP-relative LEA. It's smaller code-size and can run on more execution ports on most CPUs. (Fun fact: MacOS always puts the "image base" outside the low 4GiB so this optimization isn't possible there.)
In a non-PIE executable, you also might as well use call puts or jmp puts and let the linker sort it out, unless you want more efficient no-plt style dynamic linking. But if you do choose to statically link libc, I think this is the only way you'll get a direct jmp to the libc function.
(I think the possibility of static linking for non-PIE is why ld is willing to generate PLT stubs automatically for non-PIE, but not for PIE or shared libraries. It requires you to say what you mean when linking ELF shared objects.)
If you did use call puts in a PIE (call rel32), it could only work if you statically linked a position-independent implementation of puts into your PIE, so the entire thing was one executable that would get loaded at a random address at runtime (by the usual dynamic-linker mechanism), but simply didn't have a dependency on libc.so.6
Linker "relaxing" calls when the target is present at static-link time
GAS call *bar#GOTPCREL(%rip) uses R_X86_64_GOTPCRELX (relaxable)
NASM call [rel bar wrt ..got] uses R_X86_64_GOTPCREL (not relaxable)
This is less of a problem with hand-written asm; you can just use call bar when you know the symbol will be present in another .o (rather than .so) that you're going to link. But C compilers don't know the difference between library functions and other user functions you declare with prototypes (unless you use stuff like gcc -fvisibility=hidden https://gcc.gnu.org/wiki/Visibility or attributes / pragmas).
Still, you might want to write asm source that the linker can optimize if you statically link a library, but AFAIK you can't do that with NASM. You can export a symbol as hidden (visible at static-link time, but not for dynamic linking in the final .so) with global bar:function hidden, but that's in the source file defining the function, not files accessing it.
global bar
bar:
mov eax,231
syscall
call bar wrt ..plt
call [rel bar wrt ..got]
extern bar
The 2nd file, after assembling with nasm -felf64 and disassembling with objdump -drwc -Mintel to see the relocations:
0000000000000000 <.text>:
0: e8 00 00 00 00 call 0x5 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 0xb 7: R_X86_64_GOTPCREL bar-0x4
After linking with ld (GNU Binutils) 2.35.1 - ld bar.o bar2.o -o bar
0000000000401000 <_start>:
401000: e8 0b 00 00 00 call 401010 <bar>
401005: ff 15 ed 1f 00 00 call QWORD PTR [rip+0x1fed] # 402ff8 <.got>
40100b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
0000000000401010 <bar>:
401010: b8 e7 00 00 00 mov eax,0xe7
401015: 0f 05 syscall
Note that the PLT form got relaxed to just a direct call bar, PLT eliminated. But the ff 15 call [rel mem] was not relaxed to an e8 rel32
With GAS:
_start:
call bar#plt
call *bar#GOTPCREL(%rip)
gcc -c foo.s && disas foo.o
0000000000000000 <_start>:
0: e8 00 00 00 00 call 5 <_start+0x5> 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # b <_start+0xb> 7: R_X86_64_GOTPCRELX bar-0x4
Note the X at the end of R_X86_64_GOTPCRELX.
ld bar2.o foo.o -o bar && disas bar:
0000000000401000 <bar>:
401000: b8 e7 00 00 00 mov eax,0xe7
401005: 0f 05 syscall
0000000000401007 <_start>:
401007: e8 f4 ff ff ff call 401000 <bar>
40100c: 67 e8 ee ff ff ff addr32 call 401000 <bar>
Both calls got relaxed to a direct e8 call rel32 straight to the target address. The extra byte in indirect call is filled with a 67 address-size prefix (which has no effect on call rel32), padding the instruction to the same length. (Because it's too late to re-assemble and re-compute all relative branches within functions, and alignment and so on.)
That would happen for call *puts#GOTPCREL(%rip) if you statically linked libc, with gcc -static.
The 0xe8 opcode is followed by a signed offset to be applied to the PC (which has advanced to the next instruction by that time) to compute the branch target. Hence objdump is interpreting the branch target as 0x671.
YASM is rendering zeros because it has likely put a relocation on that offset, which is how it asks the loader to populate the correct offset for puts during loading. The loader is encountering an overflow when computing the relocation, which may indicate that puts is at a further offset from your call than can be represented in a 32-bit signed offset. Hence the loader fails to fix this instruction, and you get a crash.
66c: e8 00 00 00 00 shows the unpopulated address. If you look in your relocation table, you should see a relocation on 0x66d. It is not uncommon for the assembler to populate addresses/offsets with relocations as all zeros.
This page suggests that YASM has a WRT directive that can control use of .got, .plt, etc.
Per S9.2.5 on the NASM documentation, it looks like you can use CALL puts WRT ..plt (presuming YASM has the same syntax).
I have a function foo written in assembly and compiled with yasm and GCC on Linux (Ubuntu) 64-bit. It simply prints a message to stdout using puts(), here is how it looks:
bits 64
extern puts
global foo
section .data
message:
db 'foo() called', 0
section .text
foo:
push rbp
mov rbp, rsp
lea rdi, [rel message]
call puts
pop rbp
ret
It is called by a C program compiled with GCC:
extern void foo();
int main() {
foo();
return 0;
}
Build commands:
yasm -f elf64 foo_64_unix.asm
gcc -c foo_main.c -o foo_main.o
gcc foo_64_unix.o foo_main.o -o foo
./foo
Here is the problem:
When running the program it prints an error message and immediately segfaults during the call to puts:
./foo: Symbol `puts' causes overflow in R_X86_64_PC32 relocation
Segmentation fault
After disassembling with objdump I see that the call is made with the wrong address:
0000000000000660 <foo>:
660: 90 nop
661: 55 push %rbp
662: 48 89 e5 mov %rsp,%rbp
665: 48 8d 3d a4 09 20 00 lea 0x2009a4(%rip),%rdi
66c: e8 00 00 00 00 callq 671 <foo+0x11> <-- here
671: 5d pop %rbp
672: c3 retq
(671 is the address of the next instruction, not address of puts)
However, if I rewrite the same code in C the call is done differently:
645: e8 c6 fe ff ff callq 510 <puts#plt>
i.e. it references puts from the PLT.
Is it possible to tell yasm to generate similar code?
TL:DR: 3 options:
Build a non-PIE executable (gcc -no-pie -fno-pie call-lib.c libcall.o) so the linker will generate a PLT entry for you transparently when you write call puts.
call puts wrt ..plt like gcc -fPIE would do.
call [rel puts wrt ..got] like gcc -fno-plt would do.
The latter two will work in PIE executables or shared libraries. The 3rd way, wrt ..got, is slightly more efficient.
Your gcc is building PIE executables by default (32-bit absolute addresses no longer allowed in x86-64 Linux?).
I'm not sure why, but when doing so the linker doesn't automatically resolve call puts to call puts#plt. There is still a puts PLT entry generated, but the call doesn't go there.
At runtime, the dynamic linker tries to resolve puts directly to the libc symbol of that name and fixup the call rel32. But the symbol is more than +-2^31 away, so we get a warning about overflow of the R_X86_64_PC32 relocation. The low 32 bits of the target address are correct, but the upper bits aren't. (Thus your call jumps to a bad address).
Your code works for me if I build with gcc -no-pie -fno-pie call-lib.c libcall.o. The -no-pie is the critical part: it's the linker option. Your YASM command doesn't have to change.
When making a traditional position-dependent executable, the linker turns the puts symbol for the call target into puts#plt for you, because we're linking a dynamic executable (instead of statically linking libc with gcc -static -fno-pie, in which case the call could go directly to the libc function.)
Anyway, this is why gcc emits call puts#plt (GAS syntax) when compiling with -fpie (the default on your desktop, but not the default on https://godbolt.org/), but just call puts when compiling with -fno-pie.
See What does #plt mean here? for more about the PLT, and also Sorry state of dynamic libraries on Linux from a few years ago. (The modern gcc -fno-plt is like one of the ideas in that blog post.)
BTW, a more accurate/specific prototype would let gcc avoid zeroing EAX before calling foo:
extern void foo(); in C means extern void foo(...);
You could declare it as extern void foo(void);, which is what () means in C++. C++ doesn't allow function declarations that leave the args unspecified.
asm improvements
You can also put message in section .rodata (read-only data, linked as part of the text segment).
You don't need a stack frame, just something to align the stack by 16 before a call. A dummy push rax will do it.
Or we can tail-call puts by jumping to it instead of calling it, with the same stack position as on entry to this function. This works with or without PIE. Just replace call with jmp, as long as RSP is pointing at your own return address.
If you want to make PIE executables (or shared libraries), you have two options
call puts wrt ..plt - explicitly call through the PLT.
call [rel puts wrt ..got] - explicitly do an indirect call through the GOT entry, like gcc's -fno-plt style of code-gen. (Using a RIP-relative addressing mode to reach the GOT, hence the rel keyword).
WRT = With Respect To. The NASM manual documents wrt ..plt, and see also section 7.9.3: special symbols and WRT.
Normally you would use default rel at the top of your file so you can actually use call [puts wrt ..got] and still get a RIP-relative addressing mode. You can't use a 32-bit absolute addressing mode in PIE or PIC code.
call [puts wrt ..got] assembles to a memory-indirect call using the function pointer that dynamic linking stored in the GOT. (Early-binding, not lazy dynamic linking.)
NASM documents ..got for getting the address of variables in section 9.2.3. Functions in (other) libraries are identical: you get a pointer from the GOT instead of calling directly, because the offset isn't a link-time constant and might not fit in 32-bits.
YASM also accepts call [puts wrt ..GOTPCREL], like AT&T syntax call *puts#GOTPCREL(%rip), but NASM does not.
; don't use BITS 64. You *want* an error if you try to assemble this into a 32-bit .o
default rel ; RIP-relative addressing instead of 32-bit absolute by default; makes the [rel ...] optional
section .rodata ; .rodata is best for constants, not .data
message:
db 'foo() called', 0
section .text
global foo
foo:
sub rsp, 8 ; align the stack by 16
; PIE with PLT
lea rdi, [rel message] ; needed for PIE
call puts WRT ..plt ; tailcall puts
;or
; PIE with -fno-plt style code, skips the PLT indirection
lea rdi, [rel message]
call [rel puts wrt ..got]
;or
; non-PIE
mov edi, message ; more efficient, but only works in non-PIE / non-PIC
call puts ; linker will rewrite it into call puts#plt
add rsp,8 ; restore the stack, undoing the add
ret
In a position-dependent Linux executable, you can use mov edi, message instead of a RIP-relative LEA. It's smaller code-size and can run on more execution ports on most CPUs. (Fun fact: MacOS always puts the "image base" outside the low 4GiB so this optimization isn't possible there.)
In a non-PIE executable, you also might as well use call puts or jmp puts and let the linker sort it out, unless you want more efficient no-plt style dynamic linking. But if you do choose to statically link libc, I think this is the only way you'll get a direct jmp to the libc function.
(I think the possibility of static linking for non-PIE is why ld is willing to generate PLT stubs automatically for non-PIE, but not for PIE or shared libraries. It requires you to say what you mean when linking ELF shared objects.)
If you did use call puts in a PIE (call rel32), it could only work if you statically linked a position-independent implementation of puts into your PIE, so the entire thing was one executable that would get loaded at a random address at runtime (by the usual dynamic-linker mechanism), but simply didn't have a dependency on libc.so.6
Linker "relaxing" calls when the target is present at static-link time
GAS call *bar#GOTPCREL(%rip) uses R_X86_64_GOTPCRELX (relaxable)
NASM call [rel bar wrt ..got] uses R_X86_64_GOTPCREL (not relaxable)
This is less of a problem with hand-written asm; you can just use call bar when you know the symbol will be present in another .o (rather than .so) that you're going to link. But C compilers don't know the difference between library functions and other user functions you declare with prototypes (unless you use stuff like gcc -fvisibility=hidden https://gcc.gnu.org/wiki/Visibility or attributes / pragmas).
Still, you might want to write asm source that the linker can optimize if you statically link a library, but AFAIK you can't do that with NASM. You can export a symbol as hidden (visible at static-link time, but not for dynamic linking in the final .so) with global bar:function hidden, but that's in the source file defining the function, not files accessing it.
global bar
bar:
mov eax,231
syscall
call bar wrt ..plt
call [rel bar wrt ..got]
extern bar
The 2nd file, after assembling with nasm -felf64 and disassembling with objdump -drwc -Mintel to see the relocations:
0000000000000000 <.text>:
0: e8 00 00 00 00 call 0x5 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 0xb 7: R_X86_64_GOTPCREL bar-0x4
After linking with ld (GNU Binutils) 2.35.1 - ld bar.o bar2.o -o bar
0000000000401000 <_start>:
401000: e8 0b 00 00 00 call 401010 <bar>
401005: ff 15 ed 1f 00 00 call QWORD PTR [rip+0x1fed] # 402ff8 <.got>
40100b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
0000000000401010 <bar>:
401010: b8 e7 00 00 00 mov eax,0xe7
401015: 0f 05 syscall
Note that the PLT form got relaxed to just a direct call bar, PLT eliminated. But the ff 15 call [rel mem] was not relaxed to an e8 rel32
With GAS:
_start:
call bar#plt
call *bar#GOTPCREL(%rip)
gcc -c foo.s && disas foo.o
0000000000000000 <_start>:
0: e8 00 00 00 00 call 5 <_start+0x5> 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # b <_start+0xb> 7: R_X86_64_GOTPCRELX bar-0x4
Note the X at the end of R_X86_64_GOTPCRELX.
ld bar2.o foo.o -o bar && disas bar:
0000000000401000 <bar>:
401000: b8 e7 00 00 00 mov eax,0xe7
401005: 0f 05 syscall
0000000000401007 <_start>:
401007: e8 f4 ff ff ff call 401000 <bar>
40100c: 67 e8 ee ff ff ff addr32 call 401000 <bar>
Both calls got relaxed to a direct e8 call rel32 straight to the target address. The extra byte in indirect call is filled with a 67 address-size prefix (which has no effect on call rel32), padding the instruction to the same length. (Because it's too late to re-assemble and re-compute all relative branches within functions, and alignment and so on.)
That would happen for call *puts#GOTPCREL(%rip) if you statically linked libc, with gcc -static.
The 0xe8 opcode is followed by a signed offset to be applied to the PC (which has advanced to the next instruction by that time) to compute the branch target. Hence objdump is interpreting the branch target as 0x671.
YASM is rendering zeros because it has likely put a relocation on that offset, which is how it asks the loader to populate the correct offset for puts during loading. The loader is encountering an overflow when computing the relocation, which may indicate that puts is at a further offset from your call than can be represented in a 32-bit signed offset. Hence the loader fails to fix this instruction, and you get a crash.
66c: e8 00 00 00 00 shows the unpopulated address. If you look in your relocation table, you should see a relocation on 0x66d. It is not uncommon for the assembler to populate addresses/offsets with relocations as all zeros.
This page suggests that YASM has a WRT directive that can control use of .got, .plt, etc.
Per S9.2.5 on the NASM documentation, it looks like you can use CALL puts WRT ..plt (presuming YASM has the same syntax).
I have a function foo written in assembly and compiled with yasm and GCC on Linux (Ubuntu) 64-bit. It simply prints a message to stdout using puts(), here is how it looks:
bits 64
extern puts
global foo
section .data
message:
db 'foo() called', 0
section .text
foo:
push rbp
mov rbp, rsp
lea rdi, [rel message]
call puts
pop rbp
ret
It is called by a C program compiled with GCC:
extern void foo();
int main() {
foo();
return 0;
}
Build commands:
yasm -f elf64 foo_64_unix.asm
gcc -c foo_main.c -o foo_main.o
gcc foo_64_unix.o foo_main.o -o foo
./foo
Here is the problem:
When running the program it prints an error message and immediately segfaults during the call to puts:
./foo: Symbol `puts' causes overflow in R_X86_64_PC32 relocation
Segmentation fault
After disassembling with objdump I see that the call is made with the wrong address:
0000000000000660 <foo>:
660: 90 nop
661: 55 push %rbp
662: 48 89 e5 mov %rsp,%rbp
665: 48 8d 3d a4 09 20 00 lea 0x2009a4(%rip),%rdi
66c: e8 00 00 00 00 callq 671 <foo+0x11> <-- here
671: 5d pop %rbp
672: c3 retq
(671 is the address of the next instruction, not address of puts)
However, if I rewrite the same code in C the call is done differently:
645: e8 c6 fe ff ff callq 510 <puts#plt>
i.e. it references puts from the PLT.
Is it possible to tell yasm to generate similar code?
TL:DR: 3 options:
Build a non-PIE executable (gcc -no-pie -fno-pie call-lib.c libcall.o) so the linker will generate a PLT entry for you transparently when you write call puts.
call puts wrt ..plt like gcc -fPIE would do.
call [rel puts wrt ..got] like gcc -fno-plt would do.
The latter two will work in PIE executables or shared libraries. The 3rd way, wrt ..got, is slightly more efficient.
Your gcc is building PIE executables by default (32-bit absolute addresses no longer allowed in x86-64 Linux?).
I'm not sure why, but when doing so the linker doesn't automatically resolve call puts to call puts#plt. There is still a puts PLT entry generated, but the call doesn't go there.
At runtime, the dynamic linker tries to resolve puts directly to the libc symbol of that name and fixup the call rel32. But the symbol is more than +-2^31 away, so we get a warning about overflow of the R_X86_64_PC32 relocation. The low 32 bits of the target address are correct, but the upper bits aren't. (Thus your call jumps to a bad address).
Your code works for me if I build with gcc -no-pie -fno-pie call-lib.c libcall.o. The -no-pie is the critical part: it's the linker option. Your YASM command doesn't have to change.
When making a traditional position-dependent executable, the linker turns the puts symbol for the call target into puts#plt for you, because we're linking a dynamic executable (instead of statically linking libc with gcc -static -fno-pie, in which case the call could go directly to the libc function.)
Anyway, this is why gcc emits call puts#plt (GAS syntax) when compiling with -fpie (the default on your desktop, but not the default on https://godbolt.org/), but just call puts when compiling with -fno-pie.
See What does #plt mean here? for more about the PLT, and also Sorry state of dynamic libraries on Linux from a few years ago. (The modern gcc -fno-plt is like one of the ideas in that blog post.)
BTW, a more accurate/specific prototype would let gcc avoid zeroing EAX before calling foo:
extern void foo(); in C means extern void foo(...);
You could declare it as extern void foo(void);, which is what () means in C++. C++ doesn't allow function declarations that leave the args unspecified.
asm improvements
You can also put message in section .rodata (read-only data, linked as part of the text segment).
You don't need a stack frame, just something to align the stack by 16 before a call. A dummy push rax will do it.
Or we can tail-call puts by jumping to it instead of calling it, with the same stack position as on entry to this function. This works with or without PIE. Just replace call with jmp, as long as RSP is pointing at your own return address.
If you want to make PIE executables (or shared libraries), you have two options
call puts wrt ..plt - explicitly call through the PLT.
call [rel puts wrt ..got] - explicitly do an indirect call through the GOT entry, like gcc's -fno-plt style of code-gen. (Using a RIP-relative addressing mode to reach the GOT, hence the rel keyword).
WRT = With Respect To. The NASM manual documents wrt ..plt, and see also section 7.9.3: special symbols and WRT.
Normally you would use default rel at the top of your file so you can actually use call [puts wrt ..got] and still get a RIP-relative addressing mode. You can't use a 32-bit absolute addressing mode in PIE or PIC code.
call [puts wrt ..got] assembles to a memory-indirect call using the function pointer that dynamic linking stored in the GOT. (Early-binding, not lazy dynamic linking.)
NASM documents ..got for getting the address of variables in section 9.2.3. Functions in (other) libraries are identical: you get a pointer from the GOT instead of calling directly, because the offset isn't a link-time constant and might not fit in 32-bits.
YASM also accepts call [puts wrt ..GOTPCREL], like AT&T syntax call *puts#GOTPCREL(%rip), but NASM does not.
; don't use BITS 64. You *want* an error if you try to assemble this into a 32-bit .o
default rel ; RIP-relative addressing instead of 32-bit absolute by default; makes the [rel ...] optional
section .rodata ; .rodata is best for constants, not .data
message:
db 'foo() called', 0
section .text
global foo
foo:
sub rsp, 8 ; align the stack by 16
; PIE with PLT
lea rdi, [rel message] ; needed for PIE
call puts WRT ..plt ; tailcall puts
;or
; PIE with -fno-plt style code, skips the PLT indirection
lea rdi, [rel message]
call [rel puts wrt ..got]
;or
; non-PIE
mov edi, message ; more efficient, but only works in non-PIE / non-PIC
call puts ; linker will rewrite it into call puts#plt
add rsp,8 ; restore the stack, undoing the add
ret
In a position-dependent Linux executable, you can use mov edi, message instead of a RIP-relative LEA. It's smaller code-size and can run on more execution ports on most CPUs. (Fun fact: MacOS always puts the "image base" outside the low 4GiB so this optimization isn't possible there.)
In a non-PIE executable, you also might as well use call puts or jmp puts and let the linker sort it out, unless you want more efficient no-plt style dynamic linking. But if you do choose to statically link libc, I think this is the only way you'll get a direct jmp to the libc function.
(I think the possibility of static linking for non-PIE is why ld is willing to generate PLT stubs automatically for non-PIE, but not for PIE or shared libraries. It requires you to say what you mean when linking ELF shared objects.)
If you did use call puts in a PIE (call rel32), it could only work if you statically linked a position-independent implementation of puts into your PIE, so the entire thing was one executable that would get loaded at a random address at runtime (by the usual dynamic-linker mechanism), but simply didn't have a dependency on libc.so.6
Linker "relaxing" calls when the target is present at static-link time
GAS call *bar#GOTPCREL(%rip) uses R_X86_64_GOTPCRELX (relaxable)
NASM call [rel bar wrt ..got] uses R_X86_64_GOTPCREL (not relaxable)
This is less of a problem with hand-written asm; you can just use call bar when you know the symbol will be present in another .o (rather than .so) that you're going to link. But C compilers don't know the difference between library functions and other user functions you declare with prototypes (unless you use stuff like gcc -fvisibility=hidden https://gcc.gnu.org/wiki/Visibility or attributes / pragmas).
Still, you might want to write asm source that the linker can optimize if you statically link a library, but AFAIK you can't do that with NASM. You can export a symbol as hidden (visible at static-link time, but not for dynamic linking in the final .so) with global bar:function hidden, but that's in the source file defining the function, not files accessing it.
global bar
bar:
mov eax,231
syscall
call bar wrt ..plt
call [rel bar wrt ..got]
extern bar
The 2nd file, after assembling with nasm -felf64 and disassembling with objdump -drwc -Mintel to see the relocations:
0000000000000000 <.text>:
0: e8 00 00 00 00 call 0x5 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 0xb 7: R_X86_64_GOTPCREL bar-0x4
After linking with ld (GNU Binutils) 2.35.1 - ld bar.o bar2.o -o bar
0000000000401000 <_start>:
401000: e8 0b 00 00 00 call 401010 <bar>
401005: ff 15 ed 1f 00 00 call QWORD PTR [rip+0x1fed] # 402ff8 <.got>
40100b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
0000000000401010 <bar>:
401010: b8 e7 00 00 00 mov eax,0xe7
401015: 0f 05 syscall
Note that the PLT form got relaxed to just a direct call bar, PLT eliminated. But the ff 15 call [rel mem] was not relaxed to an e8 rel32
With GAS:
_start:
call bar#plt
call *bar#GOTPCREL(%rip)
gcc -c foo.s && disas foo.o
0000000000000000 <_start>:
0: e8 00 00 00 00 call 5 <_start+0x5> 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # b <_start+0xb> 7: R_X86_64_GOTPCRELX bar-0x4
Note the X at the end of R_X86_64_GOTPCRELX.
ld bar2.o foo.o -o bar && disas bar:
0000000000401000 <bar>:
401000: b8 e7 00 00 00 mov eax,0xe7
401005: 0f 05 syscall
0000000000401007 <_start>:
401007: e8 f4 ff ff ff call 401000 <bar>
40100c: 67 e8 ee ff ff ff addr32 call 401000 <bar>
Both calls got relaxed to a direct e8 call rel32 straight to the target address. The extra byte in indirect call is filled with a 67 address-size prefix (which has no effect on call rel32), padding the instruction to the same length. (Because it's too late to re-assemble and re-compute all relative branches within functions, and alignment and so on.)
That would happen for call *puts#GOTPCREL(%rip) if you statically linked libc, with gcc -static.
The 0xe8 opcode is followed by a signed offset to be applied to the PC (which has advanced to the next instruction by that time) to compute the branch target. Hence objdump is interpreting the branch target as 0x671.
YASM is rendering zeros because it has likely put a relocation on that offset, which is how it asks the loader to populate the correct offset for puts during loading. The loader is encountering an overflow when computing the relocation, which may indicate that puts is at a further offset from your call than can be represented in a 32-bit signed offset. Hence the loader fails to fix this instruction, and you get a crash.
66c: e8 00 00 00 00 shows the unpopulated address. If you look in your relocation table, you should see a relocation on 0x66d. It is not uncommon for the assembler to populate addresses/offsets with relocations as all zeros.
This page suggests that YASM has a WRT directive that can control use of .got, .plt, etc.
Per S9.2.5 on the NASM documentation, it looks like you can use CALL puts WRT ..plt (presuming YASM has the same syntax).
Given the following assembly I obtained from GDB, is there any way to determine what function is being imported? The GDB comments tell us that we are importing puts, but how would I figure this out statically?
Dump of assembler code for function puts#plt:
0x00000000004003b8 <+0>: jmpq *0x2004a2(%rip) # 0x600860 <puts#got.plt>
0x00000000004003be <+6>: pushq $0x0
0x00000000004003c3 <+11>: jmpq 0x4003a8
You can use objump:
objdump -d foo
And you can pipe objdump output to grep.
objdump -d foo | grep -A 3 'puts#plt>:'
The number after -A controls how many lines (instructions) are printed after the match.
The answer is somewhat processor- and OS-specific, different ELF implementations may do it in different ways.
For x64 Linux (and most other implementation that use PLT), you can inspect the dynamic relocations of the ELF:
> readelf -dr file.elf
...
Relocation section '.rela.plt' at offset 0x700 contains 17 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000602018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 __strcat_chk + 0
000000602020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 free + 0
000000602028 000400000007 R_X86_64_JUMP_SLO 0000000000000000 clock + 0
000000602030 000500000007 R_X86_64_JUMP_SLO 0000000000000000 fclose + 0
The "Offset" is the address of the pointer in the .got.plt section which is being used by the jmpq instructions in the stubs of the .plt section.
Initially that pointer points to the dynamic loader stub, which will resolve the address on the first call (using info from the relocation entries) and will patch the pointer in the GOT to the resolved address so that it jumps directly to the target on next calls.
See this post (and previous one) for a more extensive explanation of this mechanism.
What's the best tool for converting PE binaries to ELF binaries?
Following is a brief motivation for this question:
Suppose I have a simple C program.
I compiled it using gcc for linux(this gives ELF), and using 'i586-mingw32msvc-gcc' for Windows(this gives a PE binary).
I want to analyze these two binaries for similarities, using Bitblaze's static analysis tool - vine(http://bitblaze.cs.berkeley.edu/vine.html)
Now vine doesn't have a good support for PE binaries, so I wanted to convert PE->ELF, and then carry on with my comparison/analysis.
Since all the analysis has to run on Linux, I would prefer a utility/tool that runs on Linux.
Thanks
It is possible to rebuild an EXE as an ELF binary, but the resulting binary will segfault very soon after loading, due to the missing operating system.
Here's one method of doing it.
Summary
Dump the section headers of the EXE file.
Extract the raw section data from the EXE.
Encapsulate the raw section data in GNU linker script snippets.
Write a linker script to build an ELF binary, including those scripts from the previous step.
Run ld with the linker script to produce the ELF file.
Run the new program, and watch it segfault as it's not running on Windows (and it tries to call functions in the Import Address Table, which doesn't exist).
Detailed Example
Dump the section headers of the EXE file. I'm using objdump from the mingw cross compiler package to do this.
$ i686-pc-mingw32-objdump -h trek.exe
trek.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 AUTO 00172600 00401000 00401000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00001400 00574000 00574000 00172a00 2**2
CONTENTS, ALLOC, LOAD, DATA
2 DGROUP 0002b600 00576000 00576000 00173e00 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 000e7800 005a2000 005a2000 00000000 2**2
ALLOC
4 .reloc 00013000 0068a000 0068a000 0019f400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .rsrc 00000a00 0069d000 0069d000 001b2400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Use dd (or a hex editor) to extract the raw section data from the EXE. Here, I'm just going to copy the code and data sections (named AUTO and DGROUP in this example). You may want to copy additional sections though.
$ dd bs=512 skip=2 count=2963 if=trek.exe of=code.bin
$ dd bs=512 skip=2975 count=347 if=trek.exe of=data.bin
Note, I've converted the file offsets and section sizes from hex to decimal to use as skip and count, but I'm using a block size of 512 bytes in dd to speed up the process (example: 0x0400 = 1024 bytes = 2 blocks # 512 bytes).
Encapsulate the raw section data in GNU ld linker scripts snippets (using the BYTE directive). This will be used to populate the sections.
cat code.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >code.ld
cat data.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >data.ld
Write a linker script to build an ELF binary, including those scripts from the previous step. Note I've also set aside space for the uninitialized data (.bss) section.
start = 0x516DE8;
ENTRY(start)
OUTPUT_FORMAT("elf32-i386")
SECTIONS {
.text 0x401000 :
{
INCLUDE "code.ld";
}
.data 0x576000 :
{
INCLUDE "data.ld";
}
.bss 0x5A2000 :
{
. = . + 0x0E7800;
}
}
Run the linker script with GNU ld to produce the ELF file. Note I have to use an emulation mode elf_i386 since I'm using 64-bit Linux, otherwise a 64-bit ELF would be produced.
$ ld -o elf_trek -m elf_i386 elf_trek.ld
ld: warning: elf_trek.ld contains output sections; did you forget -T?
$ file elf_trek
elf_trek: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, not stripped
Run the new program, and watch it segfault as it's not running on Windows.
$ gdb elf_trek
(gdb) run
Starting program: /home/quasar/src/games/botf/elf_trek
Program received signal SIGSEGV, Segmentation fault.
0x0051d8e6 in ?? ()
(gdb) bt
\#0 0x0051d8e6 in ?? ()
\#1 0x00000000 in ?? ()
(gdb) x/i $eip
=> 0x51d8e6: sub (%edx),%eax
(gdb) quit
IDA Pro output for that location:
0051D8DB ; size_t stackavail(void)
0051D8DB proc stackavail near
0051D8DB push edx
0051D8DC call [ds:off_5A0588]
0051D8E2 mov edx, eax
0051D8E4 mov eax, esp
0051D8E6 sub eax, [edx]
0051D8E8 pop edx
0051D8E9 retn
0051D8E9 endp stackavail
For porting binaries to Linux, this is kind of pointless, given the Wine project.
For situations like the OP's, it may be appropriate.
I've found a simpler way to do this. Use the strip command.
Example
strip -O elf32-i386 -o myprogram.elf myprogram.exe
The -O elf32-i386 has it write out the file in that format.
To see supported formats run
strip --info
I am using the strip command from mxe, which on my system is actually named /opt/mxe/usr/bin/i686-w64-mingw32.static-strip.
I don't know whether this totally fits your needs, but is it an option for you to cross-compile with your MinGW version of gcc?
I mean do say: does it suit your needs to have i586-mingw32msvc-gcc compile direct to ELF format binaries (instead of the PEs you're currently getting). A description of how to do things in the other direction can be found here - I imagine it will be a little hacky but entirely possible to make this work for you in the other direction (I must admit I haven't tried it).