I wrote simple shared library:
extern void some_func(void);
void
function(void)
{
some_func();
}
Compiled/built:
gcc -fPIC -mcmodel=large -c test.c -o test.o
gcc -fPIC -shared test.o -o libtest.so
Disassembled, to see how some_func is referenced:
$ objdump -d libtest.so
00000000000006a0 <function>:
6a0: 55 push %rbp
6a1: 48 89 e5 mov %rsp,%rbp
6a4: 41 57 push %r15
6a6: 48 83 ec 08 sub $0x8,%rsp
6aa: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 6aa <function+0xa>
6b1: 49 bb 56 09 20 00 00 movabs $0x200956,%r11
6b8: 00 00 00
6bb: 4c 01 d8 add %r11,%rax
6be: 49 89 c7 mov %rax,%r15
6c1: 48 ba 80 f5 df ff ff movabs $0xffffffffffdff580,%rdx
6c8: ff ff ff
6cb: 48 01 c2 add %rax,%rdx
6ce: ff d2 callq *%rdx
6d0: 90 nop
6d1: 48 83 c4 08 add $0x8,%rsp
6d5: 41 5f pop %r15
6d7: 5d pop %rbp
6d8: c3 retq
Looked where .got.plt is located:
$ readelf -S libtest.so
...
[21] .got.plt PROGBITS 0000000000201000 00001000
0000000000000020 0000000000000008 WA 0 0 8
...
What is the relocation:
$ readelf -r libtest.so
Relocation section '.rela.plt' at offset 0x538 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000201018 000400000007 R_X86_64_JUMP_SLO 0000000000000000 some_func + 0
In 6aa-6bb we get absolute location of GOT: 6aa + 0x200956 = 0x201000
That agrees with readelf -S libtest.so 's output.
We skip 3 reserved bytes in GOT(functions-related) and determine that some_func's absolute address should be found at +0x18(forth byte from GOT) offset at runtime.
That agrees with readelf -r libtest.so.
But 6c1 instruction in objdump's disassembly shows:
movabs $0xfff...dff580, %rdx
I expect that source operand will hold +0x18 (offset from GOT, its address located at rax), but instead it has some large negative number.
Could you explain what it shows that number but not 0x18?
There are two kinds of relocations: static and dynamic (1); one for static linker ld and other for loader (dynamic linker, rtld) - ld-linux.so.2 for linux's glibc 2.* (check Dynamic Linking and Loading, 1999 or Static Linkers and Dyanmic Link Loaders).
When you use objdump to dump relocations, it has -r option for static relocations, and -R for dynamic relocations.
Your case is not just GOT, it is GOT.PLT - GOT used for procedute linkage. This kind of access uses dynamic relocations. So, you should check output of objdump -dR libtest.so, it will show you both disassembly and dynamic relocations in it.
Cited line from readelf -r libtest.so is just for PLT table, not for the code.
http://www.airs.com/blog/archives/41
or function calls, the program linker will set up a PLT entry to look like this:
jmp *offset(%ebx)
pushl #index
jmp first_plt_entry
The program linker will allocate an entry in the GOT for each entry in
the PLT. It will create a dynamic relocation for the GOT entry of type
JMP_SLOT. It will initialize the GOT entry to the base address of the
shared library plus the address of the second instruction in the code
sequence above. When the dynamic linker does the initial lazy binding
on a JMP_SLOT reloc, it will simply add the difference between the
shared library load address and the shared library base address to the
GOT entry. The effect is that the first jmp instruction will jump to
the second instruction, which will push the index entry and branch to
the first PLT entry. The first PLT entry is special, and looks like this:
pushl 4(%ebx)
jmp *8(%ebx)
This references the second and third entries in the GOT. The dynamic
linker will initialize them to have appropriate values for a callback
into the dynamic linker itself. The dynamic linker will use the index
pushed by the first code sequence to find the JMP_SLOT relocation.
When the dynamic linker determines the function to be called, it will
store the address of the function into the GOT entry references by the
first code sequence. Thus, the next time the function is called, the
jmp instruction will branch directly to the right code.
Related
I'm following this tutorial to the "T" here about creating a dynamically-linked shared library in Linux, and when I follow the instructions, gcc seems to statically link the library instead.
The tutorial proposes 3 files: foo.c, foo.h, and main.c. Main includes foo.h and calls foo(), defined in foo.c.
I made one little change from the tutorial for debugging... my foo looks like this:
void foo(void) {
int i = 54321;
printf( "Shared lib: %d\n", i );
}
It tells me to compile using these 3 steps:
gcc -c -Wall -Werror -fpic foo.c
gcc -shared -o libfoo.so foo.o
gcc -L/home/username/foo -Wall -o test main.c -lfoo
When I run ./test, it works, I can see the "hello 54321" from foo(). In fact, it works so well, it works if I delete libfoo.so. Seemed suspicious, so I did objdump -S test and found this little guy in the object file:
000000000000068a <foo>:
68a: 55 push %rbp
68b: 48 89 e5 mov %rsp,%rbp
68e: 48 83 ec 10 sub $0x10,%rsp
692: c7 45 fc 31 d4 00 00 movl $0xd431,-0x4(%rbp)
^^^ there's my constant, 54321, in hex.
should be in the "dynamic" object, not here, right?
699: 8b 45 fc mov -0x4(%rbp),%eax
69c: 89 c6 mov %eax,%esi
69e: 48 8d 3d af 00 00 00 lea 0xaf(%rip),%rdi # 754 <_IO_stdin_used+0x4>
6a5: b8 00 00 00 00 mov $0x0,%eax
6aa: e8 b1 fe ff ff callq 560 <printf#plt>
6af: 90 nop
6b0: c9 leaveq
6b1: c3 retq
What am I doing wrong?
Thank you in advance...
P.S. Compiling on x86_64 Debian Stretch using version gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
What am I doing wrong?
You most likely have libfoo.a in the /home/username/foo directory.
Or you accidentally used #include "foo.c" when #include "foo.h" was intended.
You can try to figure out where the definition of foo() is coming into test with:
gcc -L/home/username/foo -Wall -o test main.c -lfoo -Wl,-y,foo
which should show reference to foo coming from some /tmp/xyz.o and the definition coming from <somewhere>.
While I was reading http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/#id1
question came:
How does PIC shared library after being loaded somewhere in virtual address space of the process knows how to reference external variables?
Here is code of shared library in question:
#include <stdio.h>
extern long var;
void
shara_func(void)
{
printf("%ld\n", var);
}
Produce object code, then shared object(library):
gcc -fPIC -c lib1.c # produce PIC lib1.o
gcc -fPIC -shared lib1.o -o liblib1.so # produce PIC shared library
Disassemble shara_func in shared library:
objdump -d liblib1.so
...
00000000000006d0 <shara_func>:
6d0: 55 push %rbp
6d1: 48 89 e5 mov %rsp,%rbp
6d4: 48 8b 05 fd 08 20 00 mov 0x2008fd(%rip),%rax # 200fd8 <_DYNAMIC+0x1c8>
6db: 48 8b 00 mov (%rax),%rax
6de: 48 89 c6 mov %rax,%rsi
6e1: 48 8d 3d 19 00 00 00 lea 0x19(%rip),%rdi # 701 <_fini+0x9>
6e8: b8 00 00 00 00 mov $0x0,%eax
6ed: e8 be fe ff ff callq 5b0 <printf#plt>
6f2: 90 nop
6f3: 5d pop %rbp
6f4: c3 retq
...
I see that instruction at 0x6d4 address moves some address that is relative to PC to rax, I suppose that is the entry in GOT, GOT referenced relatively from PC to get address of external variable var at runtime(it is resolved at runtime depending where var was loaded).
Then after executing instruction at 0x6db we get external variable's actual content placed in rax, then move value from rax to rsi - second function parameter passed in register.
I was thinking that there is only one GOT in process memory, however,
see that library references GOT? How shared library knows offset to process's GOT when it(PIC library) does not know where in process memory it would be loaded? Or does each shared library has its own GOT that is loaded with her? I would be very glad if you clarify my confusion.
I was thinking that there is only one GOT in process memory, however, see that library references GOT?
We clearly see .got section as part of the library. With readelf we can find what are the sections of the library and how they are loaded:
readelf -e liblib1.so
...
Section Headers:
[21] .got PROGBITS 0000000000200fd0 00000fd0
0000000000000030 0000000000000008 WA 0 0 8
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000000078c 0x000000000000078c R E 200000
LOAD 0x0000000000000df8 0x0000000000200df8 0x0000000000200df8
0x0000000000000230 0x0000000000000238 RW 200000
...
Section to Segment mapping:
Segment Sections...
00 ... .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame
01 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
02 .dynamic
So, there is section .got, but runtime linker ld-linux.so.2 (registered as interpreter for dynamic ELFs) does not load sections; it loads segments as described by Program header with LOAD type. .got is part of segment 01 LOAD with RW flags. Other library will have own GOT (think about compiling liblib2.so from the similar source, it will not know anything about liblib1.so and will have own GOT); so it is "Global" only for the library; but not to the whole program image in memory after loading.
How shared library knows offset to process's GOT when it(PIC library) does not know where in process memory it would be loaded?
It is done by static linker when it takes several ELF objects and combine them all into one library. Linker will generate .got section and put it to some place with known offset from the library code (pc-relative, rip-relative). It writes instructions to program header, so the relative address is known and it is the only needed address to access own GOT.
When objdump is used with -r / -R flags, it will print information about relocations (static / dynamic) recorded in the ELF file or library; it can be combined with -d flag. lib1.o object had relocation here; no known offset to GOT, mov has all zero:
$ objdump -dr lib1.o
lib1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <shara_func>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # b <shara_func+0xb>
7: R_X86_64_REX_GOTPCRELX var-0x4
b: 48 8b 00 mov (%rax),%rax
e: 48 89 c6 mov %rax,%rsi
In library file this was converted to relative address by gcc -shared (it calls ld variant collect2 inside):
$ objdump -d liblib1.so
liblib1.so: file format elf64-x86-64
00000000000006d0 <shara_func>:
6d0: 55 push %rbp
6d1: 48 89 e5 mov %rsp,%rbp
6d4: 48 8b 05 fd 08 20 00 mov 0x2008fd(%rip),%rax # 200fd8 <_DYNAMIC+0x1c8>
And finally, there is dynamic relocation into GOT to put here actual address of var (done by rtld - ld-linux.so.2):
$ objdump -R liblib1.so
liblib1.so: file format elf64-x86-64
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
...
0000000000200fd8 R_X86_64_GLOB_DAT var
Let's use your lib, adding executable with definition, compiling it and running with rtld debugging enabled:
$ cat main.c
long var;
int main(){
shara_func();
return 0;
}
$ gcc main.c -llib1 -L. -o main -Wl,-rpath=`pwd`
$ LD_DEBUG=all ./main 2>&1 |less
...
311: symbol=var; lookup in file=./main [0]
311: binding file /test3/liblib1.so [0] to ./main [0]: normal symbol `var'
So, linker was able to bind relocation for var to the "main" ELF file where it is defined:
$ gdb -q ./main
Reading symbols from ./main...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x4006da
(gdb) r
Starting program: /test3/main
Breakpoint 1, 0x00000000004006da in main ()
(gdb) disassemble shara_func
Dump of assembler code for function shara_func:
0x00007ffff7bd56d0 <+0>: push %rbp
0x00007ffff7bd56d1 <+1>: mov %rsp,%rbp
0x00007ffff7bd56d4 <+4>: mov 0x2008fd(%rip),%rax # 0x7ffff7dd5fd8
0x00007ffff7bd56db <+11>: mov (%rax),%rax
0x00007ffff7bd56de <+14>: mov %rax,%rsi
No changes in mov in your func. rax after func+4 is 0x601040, it is third mapping of ./main according to /proc/$pid/maps:
00601000-00602000 rw-p 00001000 08:07 6691394 /test3/main
And it was loaded from main after this program header (readelf -e ./main)
LOAD 0x0000000000000df0 0x0000000000600df0 0x0000000000600df0
0x0000000000000248 0x0000000000000258 RW 200000
It is part of .bss section:
[26] .bss NOBITS 0000000000601038 00001038
0000000000000010 0000000000000000 WA 0 0 8
After stepping to func+11, we can check value in GOT:
(gdb) b shara_func
(gdb) r
(gdb) si
0x00007ffff7bd56db in shara_func () from /test3/liblib1.so
1: x/i $pc
=> 0x7ffff7bd56db <shara_func+11>: mov (%rax),%rax
(gdb) p $rip+0x2008fd
$6 = (void (*)()) 0x7ffff7dd5fd8
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00601040 0x00000000
Who did write correct value to this GOT entry?
(gdb) watch *0x7ffff7dd5fd8
Hardware watchpoint 2: *0x7ffff7dd5fd8
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /test3/main
Hardware watchpoint 2: *0x7ffff7dd5fd8
Old value = <unreadable>
New value = 6295616
0x00007ffff7de36bf in elf_machine_rela (..) at ../sysdeps/x86_64/dl-machine.h:435
(gdb) bt
#0 0x00007ffff7de36bf in elf_machine_rela (...) at ../sysdeps/x86_64/dl-machine.h:435
#1 elf_dynamic_do_Rela (...) at do-rel.h:137
#2 _dl_relocate_object (...) at dl-reloc.c:258
#3 0x00007ffff7ddaf5b in dl_main (...) at rtld.c:2072
#4 0x00007ffff7df0462 in _dl_sysdep_start (start_argptr=start_argptr#entry=0x7fffffffde20,
dl_main=dl_main#entry=0x7ffff7dd89a0 <dl_main>) at ../elf/dl-sysdep.c:249
#5 0x00007ffff7ddbe7a in _dl_start_final (arg=0x7fffffffde20) at rtld.c:307
#6 _dl_start (arg=0x7fffffffde20) at rtld.c:413
#7 0x00007ffff7dd7cc8 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00601040 0x00000000
Runtime linker of glibc did (rtld.c), just before calling main - here is the source (bit different version) - http://code.metager.de/source/xref/gnu/glibc/sysdeps/x86_64/dl-machine.h
329 case R_X86_64_GLOB_DAT:
330 case R_X86_64_JUMP_SLOT:
331 *reloc_addr = value + reloc->r_addend;
332 break;
With reverse stepping we can get history of code and old value = 0:
(gdb) b _dl_relocate_object
(gdb) r
(gdb) dis 3
(gdb) target record-full
(gdb) c
(gdb) disp/i $pc
(gdb) rsi
(gdb) rsi
(gdb) rsi
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00000000 0x00000000
=> 0x7ffff7de36b8 <_dl_relocate_object+1560>: add 0x10(%rbx),%rax
=> 0x7ffff7de36bc <_dl_relocate_object+1564>: mov %rax,(%r10)
=> 0x7ffff7de36bf <_dl_relocate_object+1567>: nop
I wrote this little assembly file for amd64. What the code does is not important for this question.
.globl fib
fib: mov %edi,%ecx
xor %eax,%eax
jrcxz 1f
lea 1(%rax),%ebx
0: add %rbx,%rax
xchg %rax,%rbx
loop 0b
1: ret
Then I proceeded to assemble and then disassemble this on both Solaris and Linux.
Solaris
$ as -o y.o -xarch=amd64 -V y.s
as: Sun Compiler Common 12.1 SunOS_i386 Patch 141858-04 2009/12/08
$ dis y.o
disassembly for y.o
section .text
0x0: 8b cf movl %edi,%ecx
0x2: 33 c0 xorl %eax,%eax
0x4: e3 0a jcxz +0xa <0x10>
0x6: 8d 58 01 leal 0x1(%rax),%ebx
0x9: 48 03 c3 addq %rbx,%rax
0xc: 48 93 xchgq %rbx,%rax
0xe: e2 f9 loop -0x7 <0x9>
0x10: c3 ret
Linux
$ as --64 -o y.o -V y.s
GNU assembler version 2.22.90 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.22.90.20120924
$ objdump -d y.o
y.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <fib>:
0: 89 f9 mov %edi,%ecx
2: 31 c0 xor %eax,%eax
4: e3 0a jrcxz 10 <fib+0x10>
6: 8d 58 01 lea 0x1(%rax),%ebx
9: 48 01 d8 add %rbx,%rax
c: 48 93 xchg %rax,%rbx
e: e2 f9 loop 9 <fib+0x9>
10: c3 retq
How comes the generated machine code is different? Sun as generates 8b cf for mov %edi,%ecx while gas generates 89 f9 for the very same instruction. Is this because of the various ways to encode the same instruction under x86 or do these two encodings really have a particular difference?
Some x86 instructions have multiple encodings that do the same thing. In particular, any instruction that acts on two registers can have the registers swapped and the direction bit in the instruction reversed.
Which one a given assembler/compiler picks simply depends on what the tool authors chose.
You've not specified the operand size for the mov, xor and add operations. This creates some ambiguity. The GNU assembler manual, i386 Mnemonics, mentions this:
If no suffix is specified by an instruction then as tries to fill in the missing suffix based on the destination register operand (the last one by convention). [ ... ] . Note that this is incompatible with the AT&T Unix assembler which assumes that a missing mnemonic suffix implies long operand size.
This implies the GNU assembler chooses differently - it'll pick the opcode with the R/M byte specifying the target operand (because the destination size is known/implied) while the AT&T one chooses the opcode where the R/M byte specifies the source operand (because the operand size is implied).
I've done that experiment though and given explicit operand sizes in your assembly source, and it doesn't change the GNU assembler output. There is, though, the other part of the documentation above,
Different encoding options can be specified via optional mnemonic suffix. `.s' suffix swaps 2 register operands in encoding when moving from one register to another.
which one can use; the following sourcecode, with GNU as, creates me the opcodes you got from Solaris as:
.globl fib
fib: movl.s %edi,%ecx
xorl.s %eax,%eax
jrcxz 1f
leal 1(%rax),%ebx
0: addq.s %rbx,%rax
xchgq %rax,%rbx
loop 0b
1: ret
I now understand how dynamic functions are referenced, by procedure linkage table like below:
Dump of assembler code for function foo#plt:
0x0000000000400528 <foo#plt+0>: jmpq *0x2004d2(%rip) # 0x600a00 <_GLOBAL_OFFSET_TABLE_+40>
0x000000000040052e <foo#plt+6>: pushq $0x2
0x0000000000400533 <foo#plt+11>: jmpq 0x4004f8
(gdb) disas 0x4004f8
No function contains specified address.
But I don't know how dynamic variables are referenced,though I found the values are populated in the GOT once started,but there's no stub like above,how does it work?
The dynamic loader relocates all references to variables before transferring control to the user program.
There is no "stub" for them, because once the user program starts executing, it is not possible for the loader to regain control and update variable addresses. If this isn't clear to you, then you have not really understood how the PLT lazy-resolution stub works.
Global variables are accessed indirectly, via a global offset table.
When compiling a program, the compiler generates code that performs
indirect accesses, and emits relocation information specifying the
entry in the global offset table being used.
The linker performs these relocations when creating the final
dynamically loadable object, resulting in machine code that does not
need further patching at load time.
To see this in action, consider the following code fragment.
int v1;
int f(void) { return !v1; }
The function f references a global v1. The machine code generated
for the function looks like the following (on an i386):
% gcc -c -fpic a.c
% objdump --disassemble --reloc a.o
[snip]
Disassembly of section .text:
00000000 <f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: e8 fc ff ff ff call 4 <f+0x4>
4: R_386_PC32 __i686.get_pc_thunk.cx
8: 81 c1 02 00 00 00 add $0x2,%ecx
a: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
e: 8b 81 00 00 00 00 mov 0x0(%ecx),%eax
10: R_386_GOT32 v1
14: 8b 00 mov (%eax),%eax
16: 85 c0 test %eax,%eax
18: 0f 94 c0 sete %al
1b: 0f b6 c0 movzbl %al,%eax
1e: 5d pop %ebp
1f: c3 ret
Disassembly of section .text.__i686.get_pc_thunk.cx:
00000000 <__i686.get_pc_thunk.cx>:
0: 8b 0c 24 mov (%esp),%ecx
3: c3 ret
Machine code walk-through:
(Offsets 0x0 and 0x1) The standard function prologue.
(Offset 0x3) The call to __i686.get_pc_thunk.cx prepares for
PC-relative addressing by loading the address of the instruction
after the call into register %ecx.
(Offset 0x8) The value in %ecx is adjusted to point to the start
of the global offset table. This adjustment is signalled by the
relocation entry of type R_386_GOTPC.
(Offset 0xE) The address of global v1 is retrieved. The
R_386_GOT32 relocation supplies the offset of v1's entry from
the base of the global offset table.
(Offset 0x14) The value in v1 is retrieved into register %eax.
(Offsets 0x16--0x1F) The rest of the computation for function f.
In the final shared object, the linker patches the function's code to
the following:
% gcc -shared -o a.so a.o
% objdump --disassemble a.so
...snip...
0000044c <f>:
44c: 55 push %ebp
44d: 89 e5 mov %esp,%ebp
44f: e8 18 00 00 00 call 46c <__i686.get_pc_thunk.cx>
454: 81 c1 a0 1b 00 00 add $0x1ba0,%ecx
45a: 8b 81 f8 ff ff ff mov -0x8(%ecx),%eax
460: 8b 00 mov (%eax),%eax
462: 85 c0 test %eax,%eax
...snip...
Assuming that the object gets loaded at offset O in memory, the
call instruction at offset 0x44F will load O+0x454+0x1BA0, i.e.,
O+0x1FF4 into %ecx.
The instruction at offset 0x45A subtracts 8 from %ecx
to get the address of the slot for v1 in the global offset table,
i.e., the slot for v1 is at offset 0x1FEC from the start of the
shared object.
Looking at the dynamic relocation records for the shared object, we
see a relocation record instructing the runtime loader to store the
actual address for v1 at offset 0x1FEC.
% objdump -R a.so
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
...snip...
00001fec R_386_GLOB_DAT v1
...snip...
Further reading:
Pat Beirne's "Study of ELF loading and relocs" has more information about ELF relocations.
Is there a set of command-line options that will convince gcc to produce a flat binary file from a self-contained source file? For example, suppose the contents of foo.c are
static int f(int x)
{
int y = x*x;
return y+2;
}
No external references, nothing to export to the linker. I'd like to get a small file with just the machine instructions for this function, without any other decoration. Sort of like a (DOS) .COM file except 32-bit protected mode.
Try this out:
$ gcc -c test.c
$ objcopy -O binary -j .text test.o binfile
You can make sure it's correct with objdump:
$ objdump -d test.o
test.o: file format pe-i386
Disassembly of section .text:
00000000 <_f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 04 sub $0x4,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 0f af 45 08 imul 0x8(%ebp),%eax
d: 89 45 fc mov %eax,-0x4(%ebp)
10: 8b 45 fc mov -0x4(%ebp),%eax
13: 83 c0 02 add $0x2,%eax
16: c9 leave
17: c3 ret
And compare it with the binary file:
$ hexdump -C binfile
00000000 55 89 e5 83 ec 04 8b 45 08 0f af 45 08 89 45 fc |U......E...E..E.|
00000010 8b 45 fc 83 c0 02 c9 c3 |.E......|
00000018
You can pass options to the linker directly with -Wl,<linker option>
The relevant documentation is copied below from the man gcc
-Wl,option
Pass option as an option to the linker. If option contains commas, it is split into multiple options at the commas. You can use
this syntax to pass an argument to the option. For example,
-Wl,-Map,output.map passes -Map output.map to the linker. When using the GNU linker, you can also get the same effect with
-Wl,-Map=output.map.
So when compiling with gcc if you pass -Wl,--oformat=binary you will generate a binary file instead of the elf format. Where --oformat=binary tells ld to generate a binary file.
This removes the need to objcopy separately.
Note that --oformat=binary can be expressed as OUTPUT_FORMAT("binary") from within a linker script. If you want to deal with flat binaries, there's a big chance that you would benefit from high level of control that linker scripts provide.
You can use objcopy to pull the text segment out of the .o file or the a.out file.
$ cat q.c
f() {}
$ cc -S -O q.c
$ cat q.s
.file "q.c"
.text
.globl f
.type f, #function
f:
pushl %ebp
movl %esp, %ebp
popl %ebp
ret
.size f, .-f
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
$ cc -c -O q.c
$ objcopy -O binary q.o q.bin
$ od -X q.bin
0000000 5de58955 000000c3
0000005
$ objdump -d q.o
q.o: file format elf32-i386
Disassembly of section .text:
00000000 <f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 5d pop %ebp
4: c3 ret
The other answers are definitely the way to go. However, I had to specify additional command line arguments to objcopy in order for my output to be as expected. Note that I am developing 32-bit code on a 64-bit machine, hence the -m32 argument. Also, I like intel assembly syntax better, so you'll see that in the arguments as well.
$ cat test.c
int main() { return 0; }
$ gcc -nostdinc -m32 -masm=intel -Wall -c test.c -o test.o
$ objdump --disassemble --disassembler-options intel test.o
test.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: b8 00 00 00 00 mov eax,0x0
8: 5d pop ebp
9: c3 ret
Ok, here's where I had to specify that I specifically only wanted the .text section:
$ objcopy --only-section=.text --output-target binary test.o test.bin
$ hexdump -C test.bin
00000000 55 89 e5 b8 00 00 00 00 5d c3 |U.......].|
0000000a
It took me about 2 hours of reading and trying different options before I figured this out. Hopefully this saves someone else that time.