dynamic link attempts all yield statically linked binaries? Why? - linux

I'm following this tutorial to the "T" here about creating a dynamically-linked shared library in Linux, and when I follow the instructions, gcc seems to statically link the library instead.
The tutorial proposes 3 files: foo.c, foo.h, and main.c. Main includes foo.h and calls foo(), defined in foo.c.
I made one little change from the tutorial for debugging... my foo looks like this:
void foo(void) {
int i = 54321;
printf( "Shared lib: %d\n", i );
}
It tells me to compile using these 3 steps:
gcc -c -Wall -Werror -fpic foo.c
gcc -shared -o libfoo.so foo.o
gcc -L/home/username/foo -Wall -o test main.c -lfoo
When I run ./test, it works, I can see the "hello 54321" from foo(). In fact, it works so well, it works if I delete libfoo.so. Seemed suspicious, so I did objdump -S test and found this little guy in the object file:
000000000000068a <foo>:
68a: 55 push %rbp
68b: 48 89 e5 mov %rsp,%rbp
68e: 48 83 ec 10 sub $0x10,%rsp
692: c7 45 fc 31 d4 00 00 movl $0xd431,-0x4(%rbp)
^^^ there's my constant, 54321, in hex.
should be in the "dynamic" object, not here, right?
699: 8b 45 fc mov -0x4(%rbp),%eax
69c: 89 c6 mov %eax,%esi
69e: 48 8d 3d af 00 00 00 lea 0xaf(%rip),%rdi # 754 <_IO_stdin_used+0x4>
6a5: b8 00 00 00 00 mov $0x0,%eax
6aa: e8 b1 fe ff ff callq 560 <printf#plt>
6af: 90 nop
6b0: c9 leaveq
6b1: c3 retq
What am I doing wrong?
Thank you in advance...
P.S. Compiling on x86_64 Debian Stretch using version gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0

What am I doing wrong?
You most likely have libfoo.a in the /home/username/foo directory.
Or you accidentally used #include "foo.c" when #include "foo.h" was intended.
You can try to figure out where the definition of foo() is coming into test with:
gcc -L/home/username/foo -Wall -o test main.c -lfoo -Wl,-y,foo
which should show reference to foo coming from some /tmp/xyz.o and the definition coming from <somewhere>.

Related

Stack canaries can be disabled by compiler?

Who is responsible for inserting the stack canaries in the stack? Is it the OS?
If yes, how can the gcc compiler disable them by using the -fno-stack-protector option? Or it is only a flag created using that option and added to the binary to tell the OS to not insert canaries in the stack where the binary is loaded at runtime?
EDIT: one more question
Who checks the value of the canaries if they were changed over the execution?
Again if inserted by the compiler, how can be checked by the OS? If inserted by the OS how can it be disabled by the compiler (main question)?
Who is responsible for inserting the stack canaries in the stack?
The compiler. The code for creating and checking stack canaries is a subset of the code generated by the compiler from the program source code.
For GCC:
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.
The aforementioned "guard variable" is commonly referred to as a canary:
The basic idea behind stack protection is to push a "canary" (a randomly chosen integer) on the stack just after the function return pointer has been pushed. The canary value is then checked before the function returns; if it has changed, the program will abort. Generally, stack buffer overflow (aka "stack smashing") attacks will have to change the value of the canary as they write beyond the end of the buffer before they can get to the return pointer. Since the value of the canary is unknown to the attacker, it cannot be replaced by the attack. Thus, the stack protection allows the program to abort when that happens rather than return to wherever the attacker wanted it to go.1
Example program:
Source code:
int test(int i) {
return i;
}
int main(void) {
int x;
int i = 10;
x = test(i);
return x;
}
Function from binary compiled without -fstack-protector-all:
$ objdump -dj .text test | grep -A7 "<test>:"
00000000004004ed <test>:
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: 89 7d fc mov %edi,-0x4(%rbp)
4004f4: 8b 45 fc mov -0x4(%rbp),%eax
4004f7: 5d pop %rbp
4004f8: c3 retq
Function from binary compiled with -fstack-protector-all:
$ objdump -dj .text protected_test | grep -A20 "<test>:"
000000000040055d <test>:
40055d: 55 push %rbp
40055e: 48 89 e5 mov %rsp,%rbp
400561: 48 83 ec 20 sub $0x20,%rsp
400565: 89 7d ec mov %edi,-0x14(%rbp)
400568: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax <- get guard variable value
40056f: 00 00
400571: 48 89 45 f8 mov %rax,-0x8(%rbp) <- save guard variable on stack
400575: 31 c0 xor %eax,%eax
400577: 8b 45 ec mov -0x14(%rbp),%eax
40057a: 48 8b 55 f8 mov -0x8(%rbp),%rdx <- move it to register
40057e: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx <- check it against original
400585: 00 00
400587: 74 05 je 40058e <test+0x31>
400589: e8 b2 fe ff ff callq 400440 <__stack_chk_fail#plt>
40058e: c9 leaveq
40058f: c3 retq
1. "Strong" stack protection for GCC

Why GOT entry offset appears wrong?

I wrote simple shared library:
extern void some_func(void);
void
function(void)
{
some_func();
}
Compiled/built:
gcc -fPIC -mcmodel=large -c test.c -o test.o
gcc -fPIC -shared test.o -o libtest.so
Disassembled, to see how some_func is referenced:
$ objdump -d libtest.so
00000000000006a0 <function>:
6a0: 55 push %rbp
6a1: 48 89 e5 mov %rsp,%rbp
6a4: 41 57 push %r15
6a6: 48 83 ec 08 sub $0x8,%rsp
6aa: 48 8d 05 f9 ff ff ff lea -0x7(%rip),%rax # 6aa <function+0xa>
6b1: 49 bb 56 09 20 00 00 movabs $0x200956,%r11
6b8: 00 00 00
6bb: 4c 01 d8 add %r11,%rax
6be: 49 89 c7 mov %rax,%r15
6c1: 48 ba 80 f5 df ff ff movabs $0xffffffffffdff580,%rdx
6c8: ff ff ff
6cb: 48 01 c2 add %rax,%rdx
6ce: ff d2 callq *%rdx
6d0: 90 nop
6d1: 48 83 c4 08 add $0x8,%rsp
6d5: 41 5f pop %r15
6d7: 5d pop %rbp
6d8: c3 retq
Looked where .got.plt is located:
$ readelf -S libtest.so
...
[21] .got.plt PROGBITS 0000000000201000 00001000
0000000000000020 0000000000000008 WA 0 0 8
...
What is the relocation:
$ readelf -r libtest.so
Relocation section '.rela.plt' at offset 0x538 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000201018 000400000007 R_X86_64_JUMP_SLO 0000000000000000 some_func + 0
In 6aa-6bb we get absolute location of GOT: 6aa + 0x200956 = 0x201000
That agrees with readelf -S libtest.so 's output.
We skip 3 reserved bytes in GOT(functions-related) and determine that some_func's absolute address should be found at +0x18(forth byte from GOT) offset at runtime.
That agrees with readelf -r libtest.so.
But 6c1 instruction in objdump's disassembly shows:
movabs $0xfff...dff580, %rdx
I expect that source operand will hold +0x18 (offset from GOT, its address located at rax), but instead it has some large negative number.
Could you explain what it shows that number but not 0x18?
There are two kinds of relocations: static and dynamic (1); one for static linker ld and other for loader (dynamic linker, rtld) - ld-linux.so.2 for linux's glibc 2.* (check Dynamic Linking and Loading, 1999 or Static Linkers and Dyanmic Link Loaders).
When you use objdump to dump relocations, it has -r option for static relocations, and -R for dynamic relocations.
Your case is not just GOT, it is GOT.PLT - GOT used for procedute linkage. This kind of access uses dynamic relocations. So, you should check output of objdump -dR libtest.so, it will show you both disassembly and dynamic relocations in it.
Cited line from readelf -r libtest.so is just for PLT table, not for the code.
http://www.airs.com/blog/archives/41
or function calls, the program linker will set up a PLT entry to look like this:
jmp *offset(%ebx)
pushl #index
jmp first_plt_entry
The program linker will allocate an entry in the GOT for each entry in
the PLT. It will create a dynamic relocation for the GOT entry of type
JMP_SLOT. It will initialize the GOT entry to the base address of the
shared library plus the address of the second instruction in the code
sequence above. When the dynamic linker does the initial lazy binding
on a JMP_SLOT reloc, it will simply add the difference between the
shared library load address and the shared library base address to the
GOT entry. The effect is that the first jmp instruction will jump to
the second instruction, which will push the index entry and branch to
the first PLT entry. The first PLT entry is special, and looks like this:
pushl 4(%ebx)
jmp *8(%ebx)
This references the second and third entries in the GOT. The dynamic
linker will initialize them to have appropriate values for a callback
into the dynamic linker itself. The dynamic linker will use the index
pushed by the first code sequence to find the JMP_SLOT relocation.
When the dynamic linker determines the function to be called, it will
store the address of the function into the GOT entry references by the
first code sequence. Thus, the next time the function is called, the
jmp instruction will branch directly to the right code.

How does one link NASM program to libc via ld?

I have a following program for NASM (ArchLinux i686)
SECTION .data
LC1: db "library call", 0
SECTION .text
extern exit
extern printf
;global main
;main:
global _start
_start:
push LC1
call printf
push 0
call exit
Which is assembled with command:
nasm -f elf libcall.asm
If to comment two lines with _start and uncomment two lines with main, then assemble and link with the command:
gcc libcall.o -o libcall
Then the program runs OK. But if to assemble the code with _start entry point and link with the command:
ld libcall.o -o libcall -lc
Then after launching the program in bash (via the command ./libcall) the following error message is returned:
bash: ./libcall: No such file or directory
Although the libcall file does exist. objdump shows the following:
[al libcall ]$ objdump -d libcall
libcall: file format elf32-i386
Disassembly of section .plt:
08048190 <printf#plt-0x10>:
8048190: ff 35 78 92 04 08 pushl 0x8049278
8048196: ff 25 7c 92 04 08 jmp *0x804927c
804819c: 00 00 add %al,(%eax)
...
080481a0 <printf#plt>:
80481a0: ff 25 80 92 04 08 jmp *0x8049280
80481a6: 68 00 00 00 00 push $0x0
80481ab: e9 e0 ff ff ff jmp 8048190 <printf#plt-0x10>
080481b0 <exit#plt>:
80481b0: ff 25 84 92 04 08 jmp *0x8049284
80481b6: 68 08 00 00 00 push $0x8
80481bb: e9 d0 ff ff ff jmp 8048190 <printf#plt-0x10>
Disassembly of section .text:
080481c0 <_start>:
80481c0: 68 88 92 04 08 push $0x8049288
80481c5: e8 d6 ff ff ff call 80481a0 <printf#plt>
80481ca: 6a 00 push $0x0
80481cc: e8 df ff ff ff call 80481b0 <exit#plt>
How the NASM assembly code should properly be linked with to libc via ld?
There are some parts of libc/crt that come in object files that you also need to link. Additionally you need to specify some options such as the dynamic loader (aka. interpreter) to use (which is probably the reason for your issue.) Just use gcc to do right thing for you. If you are interested you can run with gcc -v and then you will see the horrible command line it uses to link. You have been warned ;)
PS: you should use the main entry point that you have commented out.

How are variables in shared libraries referenced by loader?

I now understand how dynamic functions are referenced, by procedure linkage table like below:
Dump of assembler code for function foo#plt:
0x0000000000400528 <foo#plt+0>: jmpq *0x2004d2(%rip) # 0x600a00 <_GLOBAL_OFFSET_TABLE_+40>
0x000000000040052e <foo#plt+6>: pushq $0x2
0x0000000000400533 <foo#plt+11>: jmpq 0x4004f8
(gdb) disas 0x4004f8
No function contains specified address.
But I don't know how dynamic variables are referenced,though I found the values are populated in the GOT once started,but there's no stub like above,how does it work?
The dynamic loader relocates all references to variables before transferring control to the user program.
There is no "stub" for them, because once the user program starts executing, it is not possible for the loader to regain control and update variable addresses. If this isn't clear to you, then you have not really understood how the PLT lazy-resolution stub works.
Global variables are accessed indirectly, via a global offset table.
When compiling a program, the compiler generates code that performs
indirect accesses, and emits relocation information specifying the
entry in the global offset table being used.
The linker performs these relocations when creating the final
dynamically loadable object, resulting in machine code that does not
need further patching at load time.
To see this in action, consider the following code fragment.
int v1;
int f(void) { return !v1; }
The function f references a global v1. The machine code generated
for the function looks like the following (on an i386):
% gcc -c -fpic a.c
% objdump --disassemble --reloc a.o
[snip]
Disassembly of section .text:
00000000 <f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: e8 fc ff ff ff call 4 <f+0x4>
4: R_386_PC32 __i686.get_pc_thunk.cx
8: 81 c1 02 00 00 00 add $0x2,%ecx
a: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
e: 8b 81 00 00 00 00 mov 0x0(%ecx),%eax
10: R_386_GOT32 v1
14: 8b 00 mov (%eax),%eax
16: 85 c0 test %eax,%eax
18: 0f 94 c0 sete %al
1b: 0f b6 c0 movzbl %al,%eax
1e: 5d pop %ebp
1f: c3 ret
Disassembly of section .text.__i686.get_pc_thunk.cx:
00000000 <__i686.get_pc_thunk.cx>:
0: 8b 0c 24 mov (%esp),%ecx
3: c3 ret
Machine code walk-through:
(Offsets 0x0 and 0x1) The standard function prologue.
(Offset 0x3) The call to __i686.get_pc_thunk.cx prepares for
PC-relative addressing by loading the address of the instruction
after the call into register %ecx.
(Offset 0x8) The value in %ecx is adjusted to point to the start
of the global offset table. This adjustment is signalled by the
relocation entry of type R_386_GOTPC.
(Offset 0xE) The address of global v1 is retrieved. The
R_386_GOT32 relocation supplies the offset of v1's entry from
the base of the global offset table.
(Offset 0x14) The value in v1 is retrieved into register %eax.
(Offsets 0x16--0x1F) The rest of the computation for function f.
In the final shared object, the linker patches the function's code to
the following:
% gcc -shared -o a.so a.o
% objdump --disassemble a.so
...snip...
0000044c <f>:
44c: 55 push %ebp
44d: 89 e5 mov %esp,%ebp
44f: e8 18 00 00 00 call 46c <__i686.get_pc_thunk.cx>
454: 81 c1 a0 1b 00 00 add $0x1ba0,%ecx
45a: 8b 81 f8 ff ff ff mov -0x8(%ecx),%eax
460: 8b 00 mov (%eax),%eax
462: 85 c0 test %eax,%eax
...snip...
Assuming that the object gets loaded at offset O in memory, the
call instruction at offset 0x44F will load O+0x454+0x1BA0, i.e.,
O+0x1FF4 into %ecx.
The instruction at offset 0x45A subtracts 8 from %ecx
to get the address of the slot for v1 in the global offset table,
i.e., the slot for v1 is at offset 0x1FEC from the start of the
shared object.
Looking at the dynamic relocation records for the shared object, we
see a relocation record instructing the runtime loader to store the
actual address for v1 at offset 0x1FEC.
% objdump -R a.so
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
...snip...
00001fec R_386_GLOB_DAT v1
...snip...
Further reading:
Pat Beirne's "Study of ELF loading and relocs" has more information about ELF relocations.

Is there a way to get gcc to output raw binary?

Is there a set of command-line options that will convince gcc to produce a flat binary file from a self-contained source file? For example, suppose the contents of foo.c are
static int f(int x)
{
int y = x*x;
return y+2;
}
No external references, nothing to export to the linker. I'd like to get a small file with just the machine instructions for this function, without any other decoration. Sort of like a (DOS) .COM file except 32-bit protected mode.
Try this out:
$ gcc -c test.c
$ objcopy -O binary -j .text test.o binfile
You can make sure it's correct with objdump:
$ objdump -d test.o
test.o: file format pe-i386
Disassembly of section .text:
00000000 <_f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 04 sub $0x4,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 0f af 45 08 imul 0x8(%ebp),%eax
d: 89 45 fc mov %eax,-0x4(%ebp)
10: 8b 45 fc mov -0x4(%ebp),%eax
13: 83 c0 02 add $0x2,%eax
16: c9 leave
17: c3 ret
And compare it with the binary file:
$ hexdump -C binfile
00000000 55 89 e5 83 ec 04 8b 45 08 0f af 45 08 89 45 fc |U......E...E..E.|
00000010 8b 45 fc 83 c0 02 c9 c3 |.E......|
00000018
You can pass options to the linker directly with -Wl,<linker option>
The relevant documentation is copied below from the man gcc
-Wl,option
Pass option as an option to the linker. If option contains commas, it is split into multiple options at the commas. You can use
this syntax to pass an argument to the option. For example,
-Wl,-Map,output.map passes -Map output.map to the linker. When using the GNU linker, you can also get the same effect with
-Wl,-Map=output.map.
So when compiling with gcc if you pass -Wl,--oformat=binary you will generate a binary file instead of the elf format. Where --oformat=binary tells ld to generate a binary file.
This removes the need to objcopy separately.
Note that --oformat=binary can be expressed as OUTPUT_FORMAT("binary") from within a linker script. If you want to deal with flat binaries, there's a big chance that you would benefit from high level of control that linker scripts provide.
You can use objcopy to pull the text segment out of the .o file or the a.out file.
$ cat q.c
f() {}
$ cc -S -O q.c
$ cat q.s
.file "q.c"
.text
.globl f
.type f, #function
f:
pushl %ebp
movl %esp, %ebp
popl %ebp
ret
.size f, .-f
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
$ cc -c -O q.c
$ objcopy -O binary q.o q.bin
$ od -X q.bin
0000000 5de58955 000000c3
0000005
$ objdump -d q.o
q.o: file format elf32-i386
Disassembly of section .text:
00000000 <f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 5d pop %ebp
4: c3 ret
The other answers are definitely the way to go. However, I had to specify additional command line arguments to objcopy in order for my output to be as expected. Note that I am developing 32-bit code on a 64-bit machine, hence the -m32 argument. Also, I like intel assembly syntax better, so you'll see that in the arguments as well.
$ cat test.c
int main() { return 0; }
$ gcc -nostdinc -m32 -masm=intel -Wall -c test.c -o test.o
$ objdump --disassemble --disassembler-options intel test.o
test.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: b8 00 00 00 00 mov eax,0x0
8: 5d pop ebp
9: c3 ret
Ok, here's where I had to specify that I specifically only wanted the .text section:
$ objcopy --only-section=.text --output-target binary test.o test.bin
$ hexdump -C test.bin
00000000 55 89 e5 b8 00 00 00 00 5d c3 |U.......].|
0000000a
It took me about 2 hours of reading and trying different options before I figured this out. Hopefully this saves someone else that time.

Resources