Are return-to-libc attacks possible now? - linux

I'v read that to make a successful return to lib-c attack, the attacker should store the address of the command (for example "bin/sh") on the stack exactly after the return address to 'system' function (for example).
So as the 'system()' function reads that address as its 'parameter' and executes that command. But now after disassembling a program which calls system() I noticed that it doesn't use the stack to get the address of that string ("bin/sh"). Instead the address is stored in EDI or RDI registers. As long as the attacker can't access the registers how is it possible to perform such an attack?

It might actually be easy for an attacker to have the correct value stored in RDI without having direct access to the registers.
Take a typical vulnerable C function that might lead to a return-to-libc attack:
void f(const char* str)
{
char buf[BUF_LEN];
strcpy(buf, str);
}
On my machine, the following assembly is generated for the call to strcpy:
movq %rdi, -24(%rbp)
movq -24(%rbp), %rdx
leaq -16(%rbp), %rax
movq %rdx, %rsi
movq %rax, %rdi
call strcpy
As you can see, the destination buffer of strcpy, which is the vulnerable buffer, is stored in RDI. This means that if we manage to overwrite the return address with the address of system, it will be called with a pointer to the vulnerable buffer as its argument.
A small disclaimer: this particular example is just to illustrate that having the correct value in a register can be accomplished by other ways than having direct access to the register. The example itself is actually very difficult to exploit since strcpy will stop after the first null byte.

Related

AT&T 32-bit assembly does pushing strings after function call works?

im currently trying to understand a AT&T 32-bit assembly code and i've stumbled upon these instructions and im trying to make sense out of them:
_start:
jmp B
A:
# fd = open("libtest.so.1.0", O_RDONLY);
xorl %ecx, %ecx
movb $5, %al
popl %ebx
xorl %ecx, %ecx
int $0x80
B:
call A
.string "/lib/libtest.so.1.0"
A goes on for abit longer but it doesn't matter, my problem is within B, how is it possible to push the string after the call instruction was made? i don't see any way the string ended up in ebx other than some sort of argument passing i don't understand yet.
call pushes the return address on stack, i.e. the address following the call, which would be the address of the path string here.
Normally a ret would then pop that off and return control to the caller, but here the code pops the address into ebx and uses it as a parameter for the interrupt.
Linux will start running the program from _start label which in turn its first line jump to B, B call A this will cause the processor to push return address of instruction after the call (which in this case is just address of string) onto the stack and then processor will go to A to execute instruction from there, A continue to execute instructions until it finishes but it doesn't return using ret instruction(which will cause popping return address from the stack and jumping to this address) so processor will just keep executing instructions in sequence as normal which means it will execute call instruction again (infinite loop).

Linux sbrk() as a syscall in assembly

So, as a challenge, and for performance, I'm writing a simple server in assembly. The only way I know of is via system calls. (through int 0x80) Obviously, I'm going to need more memory than allocated at assemble, or at load, so I read up and decided I wanted to use sbrk(), mainly because I don't understand mmap() :p
At any rate, Linux provides no interrupt for sbrk(), only brk().
So... how do I find the current program break to use brk()? I thought about using getrlimit(), but I don't know how to get a resource (the process id I'd guess) to pass to getrlimit(). Or should I find some other way to implement sbrk()?
The sbrk function can be implemented by getting the current value and subtracting the desired amount manually. Some systems allow you to get the current value with brk(0), others keep track of it in a variable [which is initialized with the address of _end, which is set up by the linker to point to the initial break value].
This is a very platform-specific thing, so YMMV.
EDIT: On linux:
However, the actual Linux system call returns the new program break on success. On failure, the system call returns the current break. The glibc wrapper function does some work (i.e., checks whether the new break is less than addr) to provide the 0 and -1 return values described above.
So from assembly, you can call it with an absurd value like 0 or -1 to get the current value.
Be aware that you cannot "free" memory allocated via brk - you may want to just link in a malloc function written in C. Calling C functions from assembly isn't hard.
Source:
#include <unistd.h>
#define SOME_NUMBER 8
int main() {
void *ptr = sbrk(8);
return 0;
}
Compile using with Assembly Output option
gcc -S -o test.S test.c
Then look at the ASM code
_main:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $16, %rsp
Ltmp2:
movl $8, %eax
movl %eax, %edi
callq _sbrk
movq %rax, -16(%rbp)
movl $0, -8(%rbp)
movl -8(%rbp), %eax
movl %eax, -4(%rbp)
movl -4(%rbp), %eax
addq $16, %rsp
popq %rbp
ret
Leh_func_end1:
There is no system call for it but you should be able to still make the call

Assembly - Thread Safe Local Variables

I'm trying to have thread-safe local variables in an assembly program.
I've searched on the net, but I haven't found anything simple.
I'm currently using GCC assembler, as the program is a mix of C code and assembly, but the final program will contain code for multiple-platforms / calling conventions.
For now, I've declared my variables using the .lcomm pseudo-op.
As I understand it, those variables will be placed in the .bss section.
So I guess they will be shared by all threads.
Is there a way to have a kind of TLS variables directly in assembly, or should I use platform-specific implementations, like pthread or __declspec on Windows?
Hope it's clear enough. Don't hesitate to ask if more information is needed.
Thanks to everyone,
EDIT
Here's the code in question:
.lcomm stack0, 8
.lcomm stack1, 8
.globl _XSRuntime_CallMethod
_XSRuntime_CallMethod:
pushq %rbp
movq %rsp, %rbp
xor %rax, %rax
popq stack0( %rip )
popq stack1( %rip )
callq *%rdi
pushq stack1( %rip )
pushq stack0( %rip )
leave
ret
Basically, it's used to redirect a call to a C function.
The C prototype is:
extern uint64_t XSRuntime_CallMethod( void ( *m )( void * self, ... ), ... );
It takes a function pointer as first argument, hence the callq *%rdi, as I'm testing this with system V ABI.
The assembly code is very simple, and I'd like to keep it that way, so it can be easily maintainable.
The question is: how to make the stack0 and stack1 variables thread safe.
Not that familiar with assembler so:
.lcomm stack0, 8
.lcomm stack1, 8
.globl _XSRuntime_CallMethod
_XSRuntime_CallMethod:
pushq %rbp // save BP
movq %rsp, %rbp // load BP with SP
xor %rax, %rax // clear AX
popq stack0( %rip ) // pop return address into STACK0
popq stack1( %rip ) // pop flags into stack1
callq *%rdi // call the indirect procedure, so putting flags/return to XSRuntime_CallMethod onto stack
pushq stack1( %rip ) // put caller flags onto stack
pushq stack0( %rip ) // put caller return onto stack
leave // clean passed parameters from stack
ret // and back to caller
Is this how it works, yes??
If so, would it not be easier to just jump to the indirect procedure, rather than calling it? You then don't need any extra variables to hold the caller flags/return & the indirected procedure returns directly to the caller.
Just a suggestion - while since I did assembler.
If you have to store the caller address somewhere, dec the SP, (enter?), and use the stack frame. Anything else is likely to be thread-unsafe at some point.
Rgds,
Martin
Well, with a TLS maybee not thread-unsafe, but what about any recursive calls? You end up with another stack in the TLS to cover this, so you may as well use the 'SP' stack
Martin
How do you think the compiler implements thread-local variables? Try compiling such a program with -S or /FAs and you'll see. Hint: it has to rely on OS-specific APIs or other details to get access to TLS storage. Sometimes the preparation steps are hidden in the CRT but there's no single way to do it.
For example, here's how recent MSVC does it:
_TLS SEGMENT
?number##3HA DD 01H DUP (?) ; number
_TLS ENDS
EXTRN __tls_array:DWORD
EXTRN __tls_index:DWORD
_TEXT SEGMENT
[...]
mov eax, DWORD PTR __tls_index
mov ecx, DWORD PTR fs:__tls_array
mov edx, DWORD PTR [ecx+eax*4]
mov eax, DWORD PTR ?number##3HA[edx]
As you can see, it uses special variables which are initialized by the CRT.
On recent Linux, GCC can use TLS-specific relocations:
.globl number
.section .tbss,"awT",#nobits
number:
.zero 4
.text
[...]
movl %gs:number#NTPOFF, %eax
If you want portability, it's best not to rely on such OS-specific details but use a generic API such as pthread or use stack-based approach proposed by Martin. But I guess if you wanted portability you wouldn't use assembler :)
?? 'Classic' local variables, ie. parameters/variables/results accessed by stack offsets, are inherently thread-safe.
If you need a 'TLS' that is platform-agnostic, pass some suitable struct/class instance into all threads, either as the creation parameter, in a thread field before resuming all the threads, first message to the threads input queue or whatever...
Rgds,
Martin
As previously mentioned, local variables (stack-based) are inherently thread-safe because each thread has its own stack.
A thread-safe variable which is reachable by all threads (not stack-based) is probably best implemented using a spin-lock (or the equivalent in the Windows NT engine, a critical section). Such a variable must be locked before access, accessed and then unlocked. One variant could be that reads are free, but writes must be framed by locking/unlocking.
AFAIK only compilers themselves don't implement thread-safe variables. Instead they provide lib functions which access the required OS functionality.
You should probably be using(placing calls to) the TlsAlloc & TlsFree (or their other OS equivalents) to do something like this. the returned indices can be stored in global set once, read-only vars for easy use.
Depending on what the vars hold and what the code using them does, you might be able to get away with atomic ops, but that might yield problems of its own.

Help with understanding a very basic main() disassembly in GDB

Heyo,
I have written this very basic main function to experiment with disassembly and also to see and hopefully understand what is going on at the lower level:
int main() {
return 6;
}
Using gdb to disas main produces this:
0x08048374 <main+0>: lea 0x4(%esp),%ecx
0x08048378 <main+4>: and $0xfffffff0,%esp
0x0804837b <main+7>: pushl -0x4(%ecx)
0x0804837e <main+10>: push %ebp
0x0804837f <main+11>: mov %esp,%ebp
0x08048381 <main+13>: push %ecx
0x08048382 <main+14>: mov $0x6,%eax
0x08048387 <main+19>: pop %ecx
0x08048388 <main+20>: pop %ebp
0x08048389 <main+21>: lea -0x4(%ecx),%esp
0x0804838c <main+24>: ret
Here is my best guess as to what I think is going on and what I need help with line-by-line:
lea 0x4(%esp),%ecx
Load the address of esp + 4 into ecx. Why do we add 4 to esp?
I read somewhere that this is the address of the command line arguments. But when I did x/d $ecx I get the value of argc. Where are the actual command line argument values stored?
and $0xfffffff0,%esp
Align stack
pushl -0x4(%ecx)
Push the address of where esp was originally onto the stack. What is the purpose of this?
push %ebp
Push the base pointer onto the stack
mov %esp,%ebp
Move the current stack pointer into the base pointer
push %ecx
Push the address of original esp + 4 on to stack. Why?
mov $0x6,%eax
I wanted to return 6 here so i'm guessing the return value is stored in eax?
pop %ecx
Restore ecx to value that is on the stack. Why would we want ecx to be esp + 4 when we return?
pop %ebp
Restore ebp to value that is on the stack
lea -0x4(%ecx),%esp
Restore esp to it's original value
ret
I am a n00b when it comes to assembly so any help would be great! Also if you see any false statements about what I think is going on please correct me.
Thanks a bunch! :]
Stack frames
The code at the beginning of the function body:
push %ebp
mov %esp, %ebp
is to create the so-called stack frame, which is a "solid ground" for referencing parameters and objects local to the procedure. The %ebp register is used (as its name indicates) as a base pointer, which points to the base (or bottom) of the local stack inside the procedure.
After entering the procedure, the stack pointer register (%esp) points to the return address stored on the stack by the call instruction (it is the address of the instruction just after the call). If you'd just invoke ret now, this address would be popped from the stack into the %eip (instruction pointer) and the code would execute further from that address (of the next instruction after the call). But we don't return yet, do we? ;-)
You then push %ebp register to save its previous value somewhere and not lose it, because you'll use it for something shortly. (BTW, it usually contains the base pointer of the caller function, and when you peek that value, you'll find a previously stored %ebp, which would be again a base pointer of the function one level higher, so you can trace the call stack that way.) When you save the %ebp, you can then store the current %esp (stack pointer) there, so that %ebp will point to the same address: the base of the current local stack. The %esp will move back and forth inside the procedure when you'll be pushing and popping values on the stack or reserving & freeing local variables. But %ebp will stay fixed, still pointing to the base of the local stack frame.
Accessing parameters
Parameters passed to the procedure by the caller are "burried just uner the ground" (that is, they have positive offsets relative to the base, because stack grows down). You have in %ebp the address of the base of the local stack, where lies the previous value of the %ebp. Below it (that is, at 4(%ebp) lies the return address. So the first parameter will be at 8(%ebp), the second at 12(%ebp) and so on.
Local variables
And local variables could be allocated on the stack above the base (that is, they'd have negative offsets relative to the base). Just subtract N to the %esp and you've just allocated N bytes on the stack for local variables, by moving the top of the stack above (or, precisely, below) this region :-) You can refer to this area by negative offsets relative to %ebp, i.e. -4(%ebp) is the first word, -8(%ebp) is second etc. Remember that (%ebp) points to the base of the local stack, where the previous %ebp value has been saved. So remember to restore the stack to the previous position before you try to restore the %ebp through pop %ebp at the end of the procedure. You can do it two ways:
1. You can free only the local variables by adding back the N to the %esp (stack pointer), that is, moving the top of the stack as if these local variables had never been there. (Well, their values will stay on the stack, but they'll be considered "freed" and could be overwritten by subsequent pushes, so it's no longer safe to refer them. They're dead bodies ;-J )
2. You can flush the stack down to the ground and free all local space by simply restoring the %esp from the %ebp which has been fixed earlier to the base of the stack. It'll restore the stack pointer to the state it has just after entering the procedure and saving the %esp into %ebp. It's like loading the previously saved game when you've messed something ;-)
Turning off frame pointers
It's possible to have a less messy assembly from gcc -S by adding a switch -fomit-frame-pointer. It tells GCC to not assemble any code for setting/resetting the stack frame until it's really needed for something. Just remember that it can confuse debuggers, because they usually depend on the stack frame being there to be able to track up the call stack. But it won't break anything if you don't need to debug this binary. It's perfectly fine for release targets and it saves some spacetime.
Call Frame Information
Sometimes you can meet some strange assembler directives starting from .cfi interleaved with the function header. This is a so-called Call Frame Information. It's used by debuggers to track the function calls. But it's also used for exception handling in high-level languages, which needs stack unwinding and other call-stack-based manipulations. You can turn it off too in your assembly, by adding a switch -fno-dwarf2-cfi-asm. This tells the GCC to use plain old labels instead of those strange .cfi directives, and it adds a special data structures at the end of your assembly, refering to those labels. This doesn't turn off the CFI, just changes the format to more "transparent" one: the CFI tables are then visible to the programmer.
You did pretty good with your interpretation. When a function is called, the return address is automatically pushed to the stack, which is why argc, the first argument, has been pushed back to 4(%esp). argv would start at 8(%esp), with a pointer for each argument, followed by a null pointer. This function pushes the old value of %esp to the stack so that it can contain the original, unaligned value upon returned. The value of %ecx at return doesn't matter, which is why it is used as temporary storage for the %esp reference. Other than that, you are correct with everything.
Regarding your first question (where are stored the command line arguments), arguments to functions are right before ebp. I must say, your "real" main begins at < main + 10 >, where it pushes ebp and moves esp to ebp. I think that gcc messes everything up with all that leas just to replace the usual operations (addictions and subtractions) on esp before and after functions call. Usually a routine looks like this (simple function I did as an example):
0x080483b4 <+0>: push %ebp
0x080483b5 <+1>: mov %esp,%ebp
0x080483b7 <+3>: sub $0x10,%esp # room for local variables
0x080483ba <+6>: mov 0xc(%ebp),%eax # get arg2
0x080483bd <+9>: mov 0x8(%ebp),%edx # and arg1
0x080483c0 <+12>: lea (%edx,%eax,1),%eax # just add them
0x080483c3 <+15>: mov %eax,-0x4(%ebp) # store in local var
0x080483c6 <+18>: mov -0x4(%ebp),%eax # and return the sum
0x080483c9 <+21>: leave
0x080483ca <+22>: ret
Perhaps you've enabled some optimizations, which could make the code trickier.
Finally yes, the return value is stored in eax. Your interpretation is quite correct anyway.
The only thing I think that's outstanding from your original questions is why the following statements exist in your code:
0x08048381 <main+13>: push %ecx
0x08048382 <main+14>: mov $0x6,%eax
0x08048387 <main+19>: pop %ecx
The push and pop of %ecx at <main+13> and <main+19> don't seem to make much sense - and they don't really do anything in this example, but consider the case where your code invokes function calls.
There's no way for the system to guarantee that the calls to other functions - which will set up their own stack activation frames - won't reset register values. In fact they probably will. The code therefore sets up a saved register section on the stack where any registers used by the code (other than %esp and %ebp which are already saved though the regular stack setup) are stored in the stack before possibly handing control over to function calls in the "meat" of the current code block.
When these potential calls return, the system then pops the values off the stack to restore the pre-call register values. If you were writing assembler directly rather than compiling, you'd be responsible for storing and retrieving these register values, yourself.
In the case of your example code, however, there are no function calls - only a single instruction at <main+14> where you're setting the return value, but the compiler can't know that, and preserves its registers as usual.
It would be interesting to see what would happen here if you added C statements which pushed other values onto the stack after <main+14>. If I'm right about this being a saved register section of the stack, you'd expect the compiler to insert automatic pop statements prior to <main+19> in order to clear these values.

How does a syscall actually happen on linux?

Inspired by this question
How can I force GDB to disassemble?
and related to this one
What is INT 21h?
How does an actually system call happen under linux? what happens when the call is performed, until the actual kernel routine is invoked ?
Assuming we're talking about x86:
The ID of the system call is deposited into the EAX register
Any arguments required by the system call are deposited into the locations dictated by the system call. For example, some system calls expect their argument to reside in the EBX register. Others may expect their argument to be sitting on the top of the stack.
An INT 0x80 interrupt is invoked.
The Linux kernel services the system call identified by the ID in the EAX register, depositing any results in pre-determined locations.
The calling code makes use of any results.
I may be a bit rusty at this, it's been a few years...
The given answers are correct but I would like to add that there are more mechanisms to enter kernel mode. Every recent kernel maps the "vsyscall" page in every process' address space. It contains little more than the most efficient syscall trap method.
For example on a regular 32 bit system it could contain:
0xffffe000: int $0x80
0xffffe002: ret
But on my 64-bitsystem I have access to the way more efficient method using the syscall/sysenter instructions
0xffffe000: push %ecx
0xffffe001: push %edx
0xffffe002: push %ebp
0xffffe003: mov %esp,%ebp
0xffffe005: sysenter
0xffffe007: nop
0xffffe008: nop
0xffffe009: nop
0xffffe00a: nop
0xffffe00b: nop
0xffffe00c: nop
0xffffe00d: nop
0xffffe00e: jmp 0xffffe003
0xffffe010: pop %ebp
0xffffe011: pop %edx
0xffffe012: pop %ecx
0xffffe013: ret
This vsyscall page also maps some systemcalls that can be done without a context switch. I know certain gettimeofday, time and getcpu are mapped there, but I imagine getpid could fit in there just as well.
This is already answered at
How is the system call in Linux implemented?
Probably did not match with this question because of the differing "syscall" term usage.
Basically, its very simple: Somewhere in memory lies a table where each syscall number and the address of the corresponding handler is stored (see http://lxr.linux.no/linux+v2.6.30/arch/x86/kernel/syscall_table_32.S for the x86 version)
The INT 0x80 interrupt handler then just takes the arguments out of the registers, puts them on the (kernel) stack, and calls the appropriate syscall handler.

Resources