How to determine the Stack space after overwriting the EIP - security

I am learning Stack Based Buffer overflow on the x86 Linux.
How to count the size of the stack space after overwriting the EIP in GDB with some value?
(gdb) info registers
eax 0x1 1
ecx 0xffffd7a0 -10336
edx 0xffffd17b -11909
ebx 0xff5f8948 -10516152
esp 0xffffd180 0xffffd180
ebp 0x80ad26dc 0x80ad26dc
esi 0xf7fc2000 -134471680
edi 0x0 0
eip 0x66666666 0x66666666
eflags 0x10282 [ SF IF RF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99
(gdb) info frame
Stack level 0, frame at 0xffffd184:
eip = 0x66666666; saved eip = 0xffffd300
called by frame at 0x80ad26e4
Arglist at 0xffffd17c, args:
Locals at 0xffffd17c, Previous frame's sp is 0xffffd184
Saved registers:
eip at 0xffffd180
(gdb) info stack
#0 0x66666666 in ?? ()
#1 0xffffd300 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
EDIT: Thanks for the answers, the exact command in GDB giving the right answer for this question is info proc
(gdb) info proc all

You need to look at the stack itself — the memory of the stack.  Try decoding the stack yourself, maybe start with something like x/20x $esp to dump the stack.
For one, you're looking for the return address stored in the stack: if you put a breakpoint on the first instruction of the function, then the return address is the top thing on the stack.  So, print the stack before executing a single instruction of the function, and that will tell you both the stack location and the value of the return address on the stack.
As you run the initial part of the function, it will allocate stack space for the buffer (assuming it is a local array variable).
Then, you want to see where the buffer is located with that stack space.  You can do this a number of ways — run the code and see where your input string winds up using the same command to dump the stack immediately after the input operation — or, disassemble the code and see what stack address it passes to gets or scanf, either approach should tell you where the buffer starts (and you can do both approaches to validate the other).
Once you know where the buffer starts and where the return address is stored, you can tell how many character (bytes) of input it will take to overwrite the return address.

Related

Unable to understand the result of x (examine memory) in gdb

I am trying to understand the working of a simple asm code, my task is to build the stack or the list of values pointed by rsp throughout execution.
In gdb, after setting a breakpoint # main I use x/10xg $rsp to display 10 - memory addr from rsp. But since the results are shown in 2x32-bit form, rather than 1x64, I am unable to understand what values rsp is taking.
My goal here is to make the entire stack of the program to see what goes where and to understand the order of execution of the program.
What I am confused about is:
-Why doesn't x has specifier to show results in 1x64 bit form?
-How do I achieve my goal of making the stack of the program?
Here's my asm :
0x0000555555555170 <+0>: endbr64
0x0000555555555174 <+4>: push rbp
0x0000555555555175 <+5>: mov rbp,rsp
=> 0x0000555555555178 <+8>: mov eax,0x0
0x000055555555517d <+13>: call 0x55555555515c <func>
0x0000555555555182 <+18>: pop rbp
0x0000555555555183 <+19>: ret
and the output of x/10xg $rsp when ip is at line 4 is :
0x7fffffffdd30: 0x0000000000000000 0x00007ffff7dd80b3
0x7fffffffdd40: 0x00007ffff7ffc620 0x00007fffffffde28
0x7fffffffdd50: 0x0000000100000000 0x0000555555555170
0x7fffffffdd60: 0x0000555555555190 0x1706e24ed60a5880
0x7fffffffdd70: 0x0000555555555040 0x00007fffffffde20
Shouldn't the value of rsp be the address of the next instruction, which is 0x0000555555555178?
I can see something similar to the mem addr of the code in higher addr of the stack but since it is split to 2 x 32 bit form i am unable to easily understand the value of the stack
Also, am i approaching it the correct way? I am really confused here, sorry if my question sounds stupid.
gdb version:
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
This is what i am trying to achieve.

how the stack frame works?

i am reading the csapp and some codes (x86-64) confuse me.
the book says "pushq %rbp" equals :
subq $8,%rsp
movq %rbp,(%rsp)
The c code is :
long P(long x,long y)
{
long u = Q(y);
long v = Q(x);
return u + v;
}
And the a part of the assembly code the book gives is :
pushq %rbp
pushq %rbx
subq $8,%rsp
The 'subq' confuse me.
Why it is like this?
Stack is a block of memory which grows down. There is a point in memory indicated by rsp/esp register which is a stack top. All memory above it is occupied by things placed on stack and all memory below it is free.
If you want to put something on stack you need to decrease rsp register (that is what sub instruction does) by number of bytes you need and rsp will point now to the newly reserver area you needed.
Lets look at this simple example:
rsp points to address 100. As said - whole memory above address 100 is used, and memory below 100 is free. So if you need 4 bytes you decrease rsp by 4 so it points to 96. As you have just decreased rsp you are aware that memory cells 96, 97, 98 and 99 are yours and you can use them. When you need more bytes on stack, then you again can decrease rsp to get it more.
There are two ways puting things on stack.
1. you can decrease rsp as shown above.
2. you can use push instruction which does exactly the same but in one step: push rax will decrease rsp by 8 bytes (size of rax register) and then will save its value in reserved area.
Sometimes also rbp register is being used to operate on stack.
If you need a bigger area on stack, for example for local variables, you reserve required amount on stack and then you save current rsp value into rbp. So rbp is a kind of bookmark remembering where your area is. Then you can push more things on stack, without loosing information where the allocated area was.
Before leaving function all things placed on stack need to be taken from it. It is done by pop instruction which is opposite to push - takes value from stack and moves it to register and then increases rsp. Or you can just increase rsp if you do not need to restore register values.

After entering _start, is rsp aligned?

When programs enter the _start routine at the program start, is the stack pointer aligned to a 16 byte boundary, or should it be manually aligned? I mean, is it aligned even before the prologue (push rbp; mov rbp, rsp) in _start?
I know that on x86-64 at the start of the program RSP is aligned to 8 bytes, but I do now know if it's aligned to 16 bytes. For some tasks I might need that alignment to properly execute SSE instructions which require alignment on a 16 byte boundary.
The x86-64 ABI explicitly says (3.4.1 Initial Stack and Register State) :
%rsp The stack pointer holds the address of the byte with lowest
address which is part of the stack. It is guaranteed to be 16-byte
aligned at process entry.
Since _start is the first symbol that's called when a process is entered, you can be entirely sure that it is 16-byte aligned when the OS calls _start in your executable.

Why does Linux save %ebp when doing a context switch?

When doing a context switch, x86 Linux (very cleverly) avoids saving and restoring EAX, EBX, ECX, EDX, ESI, and EDI. Of course, the userland values are saved on the kernel stack when switching into kernel mode. But the values in the kernel code are not saved -- instead, GCC directives are used which tell the compiler not to keep any values which are needed in those registers at the point where the switch happens.
Naturally, ESP has to be saved and restored. But this is what I don't understand: before ESP is switched, EBP is pushed on the kernel stack. I would think that EBP was being used as a frame pointer, but in my kernel debugger, the values sure don't look like it:
(gdb) print $esp
$22 = (void *) 0xc0025ec0
(gdb) print $ebp
$23 = (void *) 0xcf827f3c
The difference is way too big for EBP to be a frame pointer here. A comment in the code says that "EBP is saved/restored explicitly for wchan access", but I'm searching the code and can't figure out how that is so. Google isn't helping either. Can some kernel wizard step in and help here?
The difference is way too big for EBP to be a frame pointer here.
Presumably you have compiled your kernel without frame pointers enabled. See the relevant config option:
config SCHED_OMIT_FRAME_POINTER
def_bool y
prompt "Single-depth WCHAN output"
depends on X86
---help---
Calculate simpler /proc/<PID>/wchan values. If this option
is disabled then wchan values will recurse back to the
caller function. This provides more accurate wchan values,
at the expense of slightly more scheduling overhead.
If in doubt, say "Y".
The function get_wchan will do a sanity check on the ebp value, and only use it if it seems to be a frame pointer.
I think it would be better to use the above config flag in both places, so that ebp would not be saved unnecessarily if it isn't a frame pointer, and also the get_wchan would not bother if we knew there wouldn't be a frame pointer. That said, saving/restoring ebp only adds a very little overhead, so it's not tragic.
I have figured it out. EBP is a frame pointer, but at the point that I checked its value, ESP had already been switched to the new process' kernel stack, but EBP had not yet been restored (so it still had the value from the previous process). Sorry!!
The reason for storing the frame pointer is so that others can determine where in the kernel code a process went to sleep. Among other things, this is used by /proc/PID/wchan, which prints the name of the kernel function which made a process sleep.
The code which checks this is as follows (details removed for brevity):
unsigned long get_wchan(struct task_struct *p)
{
unsigned long sp, bp, ip;
sp = p->thread.sp;
bp = *(unsigned long *) sp;
do {
ip = *(unsigned long *) (bp+4);
if (!in_sched_functions(ip))
return ip;
bp = *(unsigned long *) bp;
} while (count++ < 16);
return 0;
}
Since EBP is pushed right before switching kernel stacks, the stack pointer of a sleeping process will point to the saved EBP (frame pointer) value. That frame pointer points to the caller's saved frame pointer, which points to the previous caller's, which points to the previous caller's... in other words, the saved frame pointers form a linked list going back up the call stack.
The frame pointer is saved immediately on function entry, so the value just above it (4 bytes up) is the return address to the calling function.
The loop in get_wchan walks that "linked list" (bp = *bp), checking the return address above each saved frame pointer, until it finds an address within a function like ep_poll or futex_wait_queue_me.
get_wchan just returns an address inside a function; for display in /proc, lookup_symbol_name is used to convert that address into a function name.

Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address

hi I have disassembled some programs (linux) I wrote to understand better how it works, and I noticed that the main function always begins with:
lea ecx,[esp+0x4] ; I assume this is for getting the adress of the first argument of the main...why ?
and esp,0xfffffff0 ; ??? is the compiler trying to align the stack pointer on 16 bytes ???
push DWORD PTR [ecx-0x4] ; I understand the assembler is pushing the return adress....why ?
push ebp
mov ebp,esp
push ecx ;why is ecx pushed too ??
so my question is: why all this work is done ??
I only understand the use of:
push ebp
mov ebp,esp
the rest seems useless to me...
I've had a go at it:
;# As you have already noticed, the compiler wants to align the stack
;# pointer on a 16 byte boundary before it pushes anything. That's
;# because certain instructions' memory access needs to be aligned
;# that way.
;# So in order to first save the original offset of esp (+4), it
;# executes the first instruction:
lea ecx,[esp+0x4]
;# Now alignment can happen. Without the previous insn the next one
;# would have made the original esp unrecoverable:
and esp,0xfffffff0
;# Next it pushes the return addresss and creates a stack frame. I
;# assume it now wants to make the stack look like a normal
;# subroutine call:
push DWORD PTR [ecx-0x4]
push ebp
mov ebp,esp
;# Remember that ecx is still the only value that can restore the
;# original esp. Since ecx may be garbled by any subroutine calls,
;# it has to save it somewhere:
push ecx
This is done to keep the stack aligned to a 16-byte boundary. Some instructions require certain data types to be aligned on as much as a 16-byte boundary. In order to meet this requirement, GCC makes sure that the stack is initially 16-byte aligned, and allocates stack space in multiples of 16 bytes. This can be controlled using the option -mpreferred-stack-boundary=num. If you use -mpreferred-stack-boundary=2 (for a 22=4-byte alignment), this alignment code will not be generated because the stack is always at least 4-byte aligned. However you could then have trouble if your program uses any data types that require stronger alignment.
According to the gcc manual:
On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 may not work properly if it is not 16 byte aligned.
To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.
This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.
The lea loads the original stack pointer (from before the call to main) into ecx, since the stack pointer is about to modified. This is used for two purposes:
to access the arguments to the main function, since they are relative to the original stack pointer
to restore the stack pointer to its original value when returning from main
lea ecx,[esp+0x4] ; I assume this is for getting the adress of the first argument of the main...why ?
and esp,0xfffffff0 ; ??? is the compiler trying to align the stack pointer on 16 bytes ???
push DWORD PTR [ecx-0x4] ; I understand the assembler is pushing the return adress....why ?
push ebp
mov ebp,esp
push ecx ;why is ecx pushed too ??
Even if every instruction worked perfectly with no speed penalty despite arbitrarily aligned operands, alignment would still increase performance. Imagine a loop referencing a 16-byte quantity that just overlaps two cache lines. Now, to load that little wchar into the cache, two entire cache lines have to be evicted, and what if you need them in the same loop? The cache is so tremendously faster than RAM that cache performance is always critical.
Also, there usually is a speed penalty to shift misaligned operands into the registers.
Given that the stack is being realigned, we naturally have to save the old alignment in order to traverse stack frames for parameters and returning.
ecx is a temporary register so it has to be saved. Also, depending on optimization level, some of the frame linkage ops that don't seem strictly necessary to run the program might well be important in order to set up a trace-ready chain of frames.

Resources