Why does Linux save %ebp when doing a context switch?

Why does Linux save %ebp when doing a context switch? - linux

When doing a context switch, x86 Linux (very cleverly) avoids saving and restoring EAX, EBX, ECX, EDX, ESI, and EDI. Of course, the userland values are saved on the kernel stack when switching into kernel mode. But the values in the kernel code are not saved -- instead, GCC directives are used which tell the compiler not to keep any values which are needed in those registers at the point where the switch happens.
Naturally, ESP has to be saved and restored. But this is what I don't understand: before ESP is switched, EBP is pushed on the kernel stack. I would think that EBP was being used as a frame pointer, but in my kernel debugger, the values sure don't look like it:
(gdb) print $esp
$22 = (void *) 0xc0025ec0
(gdb) print $ebp
$23 = (void *) 0xcf827f3c
The difference is way too big for EBP to be a frame pointer here. A comment in the code says that "EBP is saved/restored explicitly for wchan access", but I'm searching the code and can't figure out how that is so. Google isn't helping either. Can some kernel wizard step in and help here?

The difference is way too big for EBP to be a frame pointer here.
Presumably you have compiled your kernel without frame pointers enabled. See the relevant config option:
config SCHED_OMIT_FRAME_POINTER
def_bool y
prompt "Single-depth WCHAN output"
depends on X86
---help---
Calculate simpler /proc/<PID>/wchan values. If this option
is disabled then wchan values will recurse back to the
caller function. This provides more accurate wchan values,
at the expense of slightly more scheduling overhead.
If in doubt, say "Y".
The function get_wchan will do a sanity check on the ebp value, and only use it if it seems to be a frame pointer.
I think it would be better to use the above config flag in both places, so that ebp would not be saved unnecessarily if it isn't a frame pointer, and also the get_wchan would not bother if we knew there wouldn't be a frame pointer. That said, saving/restoring ebp only adds a very little overhead, so it's not tragic.

I have figured it out. EBP is a frame pointer, but at the point that I checked its value, ESP had already been switched to the new process' kernel stack, but EBP had not yet been restored (so it still had the value from the previous process). Sorry!!
The reason for storing the frame pointer is so that others can determine where in the kernel code a process went to sleep. Among other things, this is used by /proc/PID/wchan, which prints the name of the kernel function which made a process sleep.
The code which checks this is as follows (details removed for brevity):
unsigned long get_wchan(struct task_struct *p)
{
unsigned long sp, bp, ip;
sp = p->thread.sp;
bp = *(unsigned long *) sp;
do {
ip = *(unsigned long *) (bp+4);
if (!in_sched_functions(ip))
return ip;
bp = *(unsigned long *) bp;
} while (count++ < 16);
return 0;
}
Since EBP is pushed right before switching kernel stacks, the stack pointer of a sleeping process will point to the saved EBP (frame pointer) value. That frame pointer points to the caller's saved frame pointer, which points to the previous caller's, which points to the previous caller's... in other words, the saved frame pointers form a linked list going back up the call stack.
The frame pointer is saved immediately on function entry, so the value just above it (4 bytes up) is the return address to the calling function.
The loop in get_wchan walks that "linked list" (bp = *bp), checking the return address above each saved frame pointer, until it finds an address within a function like ep_poll or futex_wait_queue_me.
get_wchan just returns an address inside a function; for display in /proc, lookup_symbol_name is used to convert that address into a function name.

Related

Faking ASM Return Address?

Would it be possible to fake the return address at, ebp + 4.
I'm currently writing a DLL that you would inject into a game, in which it would call game functions to do things, but the functions I call check the return address against the program itself, and if its outside their base it detects it.
So basically is there any way to fake the return address in any way?
It works like this:
if ( (_BYTE *)retaddr - (_BYTE *)unusedPadding >= (unsigned int)&byte_A6132A )
{
dword_11E59F8 |= 0x200000u;
dword_162D06C = 0;
result = (void (*)())sub_51FEE0(dword_11E59FC, v5, (_BYTE *)retaddr - (_BYTE *)unusedPadding, ebx0, edi0, a1);
}

Better way:
push returnshere
push your_second_argument
push your_first_argument
push address_of_fragment_in_exe
jmp function_you_want_to_call
returnshere:
; more code
Where address_of_ret_in_exe is the address of this fragment:
add esp, 8 ;4 * number of arguments pushed
ret
This has the advantage of not editing the game binary. I've seen more than one game that checksummed itself so if you edited it, even in slack space, you're in trouble. If they went through so much trouble as to verify calls come from the game binary, than they likely have defenses against the game binary from being edited. Just be glad they don't trace the call graph.

You mean so on entry to the called function, the return address at [esp] is not the actual call site?
You can emulate call by pushing whatever you want and then jumping. Of course, the function you call this way will return to the return address you give it. There would also be a significant perf penalty for mismatched call/ret, because you'll break the CPU's return-address predictor stack.
Can you put some trampoline functions at an acceptable address range, and call through them? Although that's really inconvenient in a 32bit ABI which passes args on the stack. I guess you'd have to pass an extra arg to the trampoline function that it could use to stash the return address, instead of copying all the args.
So the trampoline could be something like:
mov [esp+20], esi ; save esi in a dummy arg slot
mov esi, [esp] ; save the orig return address in a call-preserved reg
add esp, 4 ; call with the original args
call lib_function
push esi ; restore the orig return address
mov esi, [esp+20] ; and restore esi
ret ; return to orig return address
That's not wonderful, and takes a lot of code for something that needs to be duplicated for every library functions. (And it doesn't make a stack frame, so might hurt debugging / exception handling?) For functions without many args it might be shorter to do something like
push [esp+8] ; 2nd arg
push [esp+8] ; 1st arg
call lib_function
add esp, 8
ret
Using indirect calls would let you use the same trampoline for multiple functions, but at the cost of branch mispredicts if there isn't a simple short pattern.
And of course none of these trampolines can work unless you can stick them in memory at an address the library will accept calls from.

x64 memset core, is passed buffer address truncated?

1. Problem Background
Recently a core dump occurred on one of our on-line search server. The core happens in memset() due to the attempt to write to an invalid address, and hence received the SIGSEGV signal. The following information is from dmsg:
is_searcher_ser[17405]: segfault at 000000002c32a668 rip 0000003da0a7b006 rsp 0000000053abc790 error 6
The environment of our on-line servers goes as follows:
OS: RHEL 5.3
Kernel: 2.6.18-131.el5.custom, x86_64 (64-bit)
GCC: 4.1.2 20080704 (Red Hat 4.1.2-44)
Glibc: glibc-2.5-49.6
The following is the relevant code snippet:
CHashMap<…>::CHashMap(…)
{
…
typedef HashEntry *HashEntryPtr;
m_ppEntry = new HashEntryPtr[m_nHashSize]; // m_nHashSize is 389 when core
assert(m_ppEntry != NULL);
memset(m_ppEntry, 0x0, m_nHashSize*sizeof(HashEntryPtr)); // Core in this memset() invocation
…
}
The assembly code of the above code is:
…
0x000000000091fe9e <+110>: callq 0x502638 <_Znam#plt> // new HashEntryPtr[m_nHashSize]
0x000000000091fea3 <+115>: mov 0xc(%rbx),%edx // Get the value of m_nHashSize
0x000000000091fea6 <+118>: mov %rax,%rdi // Put m_ppEntry pointer to %rdi for later memset invocation
0x000000000091fea9 <+121>: mov %rax,0x20(%rbx) // Store the pointer to m_ppEntry member variable(%rbx holds the this pointer)
0x000000000091fead <+125>: xor %esi,%esi // Generate 0
0x000000000091feaf <+127>: shl $0x3,%rdx // m_nHashSize*sizeof(HashEntryPtr)
0x000000000091feb3 <+131>: callq 0x502b38 <memset#plt> // Call the memset() function
…
In the core dump, the assembly of memset#plt is:
(gdb) disassemble 0x502b38
Dump of assembler code for function memset#plt:
0x0000000000502b38 <+0>: jmpq *0x771b92(%rip) # 0xc746d0 <memset#got.plt>
0x0000000000502b3e <+6>: pushq $0x53
0x0000000000502b43 <+11>: jmpq 0x5025f8
End of assembler dump.
(gdb) x/ag 0x0000000000502b3e+0x771b92
0xc746d0 <memset#got.plt>: 0x3da0a7acb0 <memset>
(gdb) disassemble 0x3da0a7acb0
Dump of assembler code for function memset:
0x0000003da0a7acb0 <+0>: cmp $0x1,%rdx
0x0000003da0a7acb4 <+4>: mov %rdi,%rax
…
For the above GDB analysis， we know that the address of memset() has been resolved in the relocation PLT table. That is to say, the first jmpq *0x771b92(%rip) will directly jump to the first instruction of function memset(). Besides, the program had run nearly one day on-line, the relocation address of memset() should have been already resolved earlier.
2. Weird phenomenon
This core fired at the instruction => 0x0000003da0a7b006 <+854>: mov %rdx,-0x8(%rdi) in the memset(). Actually this is the instruction in the memset() to set the 0 at the right begin position of the buffer which is the first parameter of memset().
When cored , in frame 0, the value of $rdi is 0x2c32a670 ,and $rax is 0x2c32a668. From the assembly analysis and off-line test, $rax should hold the source buffer of the memset, i.e., the first parameter of memset().
So, in our example, $rax should be same as the address of m_ppEntry, the value of which is stored in the this object (this pointer is stored in %rbx) first before it is zeroed by memset later. However, the value of m_ppEntry is 0x2ab02c32a668.
Then use info files GDB command to check, the address 0x2c32a668 is indeed invalid (not mapped), and address 0x2ab02c32a668 is a valid address.
3. Why it is weird?
The weird place of this core is that: If the real address of memset has been resolved already(very very probably), then there are only very few instructions between the operation to put the pointer value into m_ppEntry and the attempt to memset it. And actually the value of register $rax (holding the passed buffer address) are not changed at all during these instructions. So, how can m_ppEntry isn’t equal to $rax?
What is weird More is that: when core, the value of $rax (0x2c32a668) is actually the value of lower 4 bytes of m_ppEntry (0x2ab02c32a668). If there is indeed some relationship between the two values, is the m_ppEntry parameter passed to memset being truncated? However, the involved several instructions all use %rax, rather than %eax. By the way, I cannot reproduce this issue offline.
So,
1) Which address is valid? If 0x2c32a668 is valid? Is the heap corrupted just between the several instructions? And how to paraphrase that the value of m_ppEntry is 0x2ab02c32a668, and why the low 4 bytes of this two value is the same?
2) If 0x2ab02c32a668 is valid, why the address is truncated when passed into the 64-bit memset()? Under which condition this error will occur? I cannot reproduce this offline. Is this issue an known bug? I didn't find it through Google.
3) Or, is it due to some hardware or power issue to make the 4 higher bytes of %rdi passed to memset zeroed? (I’m very very reluctant to believe this).
At last, any comment on this core is appreciated.
Thanks,
Gary Hu

I'm assuming most of the time this code works fine, given your mention of one day's running.
I agree signals are worth inspecting, it does look suspiciously like pointer truncation is happening somewhere else.
Only other thing I'm thinking it could be an issue with the new. Is there any possibly that on occasion you could end up calling an overloaded new operator?
Also for completeness what is the declaration of m_ppEntry ?
I'm assuming you're using a no throw new otherwise the assert(m_ppEntry != NULL); would be meaningless.

What is %gs in Assembly

void return_input (void)
{
char array[30];
gets (array);
printf("%s\n", array);
}
After compiling it in gcc, this function is converted to the following Assembly code:
push %ebp
mov %esp,%ebp
sub $0x28,%esp
mov %gs:0x14,%eax
mov %eax,-0x4(%ebp)
xor %eax,%eax
lea -0x22(%ebp),%eax
mov %eax,(%esp)
call 0x8048374
lea -0x22(%ebp),%eax
mov %eax,(%esp)
call 0x80483a4
mov -0x4(%ebp),%eax
xor %gs:0x14,%eax
je 0x80484ac
call 0x8048394
leave
ret
I don't understand two lines:
mov %gs:0x14,%eax
xor %gs:0x14,%eax
What is %gs, and what exactly these two lines do?
This is compilation command:
cc -c -mpreferred-stack-boundary=2 -ggdb file.c

GS is a segment register, its use in linux can be read up on here (its basically used for per thread data).
mov %gs:0x14,%eax
xor %gs:0x14,%eax
this code is used to validate that the stack hasn't exploded or been corrupted, using a canary value stored at GS+0x14, see this.
gcc -fstack-protector=strong is on by default in many modern distros; you can use gcc -fno-stack-protector to not add those checks. (On x86, thread-local storage is cheap so GCC keeps the randomized canary value there, making it somewhat harder to leak.)

In the AT&T style assembly languages, the percent sigil generally indicates a register. In x86 family processors from 386 onwards, GS is one of the so-called segment registers. However, in protected mode environments segment registers work as selector registers.
A virtual memory selector represents its own mapping of virtual address space together with its own access regime. In practical terms, %gs:0x14 can be thought of as a reference into an array whose origin is held in %gs (albeit the CPU does a bit of extra dereferencing). On modern GNU/Linux systems, %gs is usually used to point at the thread-local storage region. In the code you're asking about, however, only one item of the TLS matters — the stack canary.
The idea is to attempt to detect a buffer overflow error by placing a random but constant value — it's called a stack canary in memory of the canaries coal miners used to employ to signal increase in levels of poisonous gases by dying — into the stack before gets() gets called, above its stack frame, and check whether it is still there after gets() will have returned. gets() has no business overwriting this part of the stack — it is outside its own stack frame, and it is not given a pointer to it —, so if the stack canary has died, something has gone wrong in a dangerous way. (C as a programming environment happens to be particularly prone to this kind of wrong-goings, and security researchers have learnt to exploit many of them over the last twenty years or so. Also, gets() happens to be a function that is inherently at risk to overflow its target buffer.) You have not offered addresses with your code, but 0x80484ac is likely the address of leave, and the call 0x8048394 which is executed in case of mismatch (that is, jumped over by je 0x80484ac in case of match), is probably a call to __stack_chk_fail(), provided by libc to handle the stack corruption by fleeing the metaphorical poisonous mine.
The reason the canonical value of the stack canary is kept in the thread-local storage is that this way, every thread can have its own stack canary. Stacks themselves are normally not shared between threads, so it is natural to also not share the canary value.

ES, FS, GS: Extra Segment Registers
Can be used as extra segment registers; also used in special instructions that span segments (like string copies).
taken from here
http://www.hep.wisc.edu/~pinghc/x86AssmTutorial.htm
hope it helps

Can anybody explain some simple assembly code?

I have just started to learn assembly. This is the dump from gdb for a simple program which prints hello ranjit.
Dump of assembler code for function main:
0x080483b4 <+0>: push %ebp
0x080483b5 <+1>: mov %esp,%ebp
0x080483b7 <+3>: sub $0x4,%esp
=> 0x080483ba <+6>: movl $0x8048490,(%esp)
0x080483c1 <+13>: call 0x80482f0 <puts#plt>
0x080483c6 <+18>: leave
0x080483c7 <+19>: ret
My questions are :
Why every time ebp is pushed on to stack at start of the program? What is in the ebp which is necessary to run this program?
In second line why is ebp copied to esp?
I can't get the third line at all. what I know about SUB syntax is "sub dest,source", but here how can esp be subtracted from 4 and stored in 4?
What is this value "$0x8048490"? Why it is moved to esp, and why this time is esp closed in brackets? Does it denote something different than esp without brackets?
Next line is the call to function but what is this "0x80482f0"?
What is leave and ret (maybe ret means returning to lib c.)?
operating system : ubuntu 10, compiler : gcc

ebp is used as a frame pointer in Intel processors (assuming you're using a calling convention that uses frames).
It provides a known point of reference for locating passed-in parameters (on one side) and local variables (on the other) no matter what you do with the stack pointer while your function is active.
The sequence:
push %ebp ; save callers frame pointer
mov %esp,%ebp ; create a new frame pointer
sub $N,%esp ; make space for locals
saves the frame pointer for the previous stack frame (the caller), loads up a new frame pointer, then adjusts the stack to hold things for the current "stack level".
Since parameters would have been pushed before setting up the frame, they can be accessed with [bp+N] where N is a suitable offset.
Similarly, because locals are created "under" the frame pointer, they can be accessed with [bp-N].
The leave instruction is a single one which undoes that stack frame. You used to have to do it manually but Intel introduced a faster way of getting it done. It's functionally equivalent to:
mov %ebp, %esp ; restore the old stack pointer
pop %ebp ; and frame pointer
(the old, manual way).
Answering the questions one by one in case I've missed something:
To start a new frame. See above.
It isn't. esp is copied to ebp. This is AT&T notation (the %reg is a dead giveaway) where (among other thing) source and destination operands are swapped relative to Intel notation.
See answer to (2) above. You're subtracting 4 from esp, not the other way around.
It's a parameter being passed to the function at 0x80482f0. It's not being loaded into esp but into the memory pointed at by esp. In other words, it's being pushed on the stack. Since the function being called is puts (see (5) below), it will be the address of the string you want putsed.
The function name in the <> after the address. It's calling the puts function (probably the one in the standard library though that's not guaranteed). For a description of what the PLT is, see here.
I've already explained leave above as unwinding the current stack frame before exiting. The ret simply returns from the current function. If the current functtion is main, it's going back to the C startup code.

In my career I learned several assembly languages, you didn't mention which but it appears Intel x86 (segmented memory model as PaxDiablo pointed out). However, I have not used assembly since last century (lucky me!). Here are some of your answers:
The EBP register is pushed onto the stack at the beginning because we need it further along in other operations of the routine. You don't want to just discard its original value thus corrupting the integrity of the rest of the application.
If I remember correctly (I may be wrong, long time) it is the other way around, we are moving %esp INTO %ebp, remember we saved it in the previous line? now we are storing some new value without destroying the original one.
Actually they are SUBstracting the value of four (4) FROM the contents of the %esp register. The resulting value is not stored on "four" but on %esp. If %esp had 0xFFF8 after the SUB it will contain 0xFFF4. I think this is called "Immediate" if my memory serves me. What is happening here (I reckon) is the computation of a memory address (4 bytes less).
The value $0x8048490 I don't know. However, it is NOT being moved INTO %esp but rather INTO THE ADDRESS POINTED TO BY THE CONTENTS OF %esp. That is why the notation is (%esp) rather than %esp. This is kind of a common notation in all assembly languages I came about in my career. If on the other hand the right operand was simply %esp, then the value would have been moved INTO the %esp register. Basically the %esp register's contents are being used for addressing.
It is a fixed value and the string on the right makes me think that this value is actually the address of the puts() (Put String) compiler library routine.
"leave" is an instrution that is the equivalent of "pop %ebp". Remember we saved the contents of %ebp at the beginning, now that we are done with the routine we are restoring it back into the register so that the caller gets back to its context. The "ret" instruction is the final instruction of the routine, it "returns" to the caller.

Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address

hi I have disassembled some programs (linux) I wrote to understand better how it works, and I noticed that the main function always begins with:
lea ecx,[esp+0x4] ; I assume this is for getting the adress of the first argument of the main...why ?
and esp,0xfffffff0 ; ??? is the compiler trying to align the stack pointer on 16 bytes ???
push DWORD PTR [ecx-0x4] ; I understand the assembler is pushing the return adress....why ?
push ebp
mov ebp,esp
push ecx ;why is ecx pushed too ??
so my question is: why all this work is done ??
I only understand the use of:
push ebp
mov ebp,esp
the rest seems useless to me...

I've had a go at it:
;# As you have already noticed, the compiler wants to align the stack
;# pointer on a 16 byte boundary before it pushes anything. That's
;# because certain instructions' memory access needs to be aligned
;# that way.
;# So in order to first save the original offset of esp (+4), it
;# executes the first instruction:
lea ecx,[esp+0x4]
;# Now alignment can happen. Without the previous insn the next one
;# would have made the original esp unrecoverable:
and esp,0xfffffff0
;# Next it pushes the return addresss and creates a stack frame. I
;# assume it now wants to make the stack look like a normal
;# subroutine call:
push DWORD PTR [ecx-0x4]
push ebp
mov ebp,esp
;# Remember that ecx is still the only value that can restore the
;# original esp. Since ecx may be garbled by any subroutine calls,
;# it has to save it somewhere:
push ecx

This is done to keep the stack aligned to a 16-byte boundary. Some instructions require certain data types to be aligned on as much as a 16-byte boundary. In order to meet this requirement, GCC makes sure that the stack is initially 16-byte aligned, and allocates stack space in multiples of 16 bytes. This can be controlled using the option -mpreferred-stack-boundary=num. If you use -mpreferred-stack-boundary=2 (for a 22=4-byte alignment), this alignment code will not be generated because the stack is always at least 4-byte aligned. However you could then have trouble if your program uses any data types that require stronger alignment.
According to the gcc manual:
On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 may not work properly if it is not 16 byte aligned.
To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.
This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.
The lea loads the original stack pointer (from before the call to main) into ecx, since the stack pointer is about to modified. This is used for two purposes:
to access the arguments to the main function, since they are relative to the original stack pointer
to restore the stack pointer to its original value when returning from main

lea ecx,[esp+0x4] ; I assume this is for getting the adress of the first argument of the main...why ?
and esp,0xfffffff0 ; ??? is the compiler trying to align the stack pointer on 16 bytes ???
push DWORD PTR [ecx-0x4] ; I understand the assembler is pushing the return adress....why ?
push ebp
mov ebp,esp
push ecx ;why is ecx pushed too ??
Even if every instruction worked perfectly with no speed penalty despite arbitrarily aligned operands, alignment would still increase performance. Imagine a loop referencing a 16-byte quantity that just overlaps two cache lines. Now, to load that little wchar into the cache, two entire cache lines have to be evicted, and what if you need them in the same loop? The cache is so tremendously faster than RAM that cache performance is always critical.
Also, there usually is a speed penalty to shift misaligned operands into the registers.
Given that the stack is being realigned, we naturally have to save the old alignment in order to traverse stack frames for parameters and returning.
ecx is a temporary register so it has to be saved. Also, depending on optimization level, some of the frame linkage ops that don't seem strictly necessary to run the program might well be important in order to set up a trace-ready chain of frames.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string