Is it possible to utilize ROP when Stack canaries are in place without any other "helpers" to bypass the canary values? If so could you please provide some resources/material that further explains how it'd be done when one is just using ROP without any other techniques?
It depends.
Computers have two memory access types:
Sequential access: From the start of the buffer you have to completely overwrite all the following bytes up to buf+len(payload)
Direct access: It's an arbitrary write. You can overwrite the value at the exact address you want
Probably you were considering only the first case and I'm pretty sure that, without any "helper", you can't bypass the canary. The only way would be to bruteforce its value.
In case of an arbitrary write, which can happen with a format string (an old type of vulnerability) or heap vulnerability, you just have to overwrite the stack avoiding the canary address.
N.B.: Arbitrary write is still way harder to achieve
Related
In linux , there are many ways to bypass security mechanisms( like NX, ASLR) except canary. Actually, I find the stack canary is generated by the Linux kernel in /arch/arm/include/asm/stackprotector.h/boot_init_stack_canary() function.The random number is generated by extract_entropy function at last, and it's related to the environment noises such as the keyboad, the time interval of interruption.
Are there any ways to bypass canary security mechanism when exploit a stack-overflow vulnerability?
In general the techniques for beating canaries is to use the correct value. Typically this can be done using either an information leak to obtain the correct value before overflowing the buffer onto the stack, or in some cases where the process starts again it is possible to brute force the correct value.
I'm really new to this topic and only know some basic concepts. Nevertheless, there is a question that quite confuses me.
Solar Designer proposed the idea of return-to-libc in 1997 (http://seclists.org/bugtraq/1997/Aug/63). Return-to-user, from my understanding, didn't become popular until 2007 (http://seclists.org/dailydave/2007/q1/224).
However, return-to-user seems much easier than return-to-libc. So my question is, why did hackers spend so much effort in building a gadget chain by using libc, rather than simply using their own code in the user space when exploiting a kernel vulnerability?
I don't believe that they did not realize there are NULL pointer dereference vulnerabilities in the kernel.
Great question and thank you for taking some time to research the topic before making a post that causes it's readers to lose hope in the future of mankind (you might be able to tell I've read a few [exploit] tags tonight)
Anyway, there are two reasons
return-to-libc is generic, provided you have the offsets. There is no need to programmatically or manually build either a return chain or scrape existing user functionality.
Partially because of linker-enabled relocations and partly because of history, the executable runtime of most programs executing on a Linux system essentially demand the libc runtime, or at least a runtime that can correctly handle _start and cons. This formula still stands on Windows, just under the slightly different paradigm of return-to-kernel32/oleaut/etc but can actually be more immediately powerful, particularly for shellcode with length requirements, for reasons relating to the way in which system calls are invoked indirectly by kernel32-SSDT functions
As a side-note, If you are discussing NULL pointer dereferences in the kernel, you may be confusing return to vDSO space, which is actually subject to a different set of constraints that standard "mprotect and roll" userland does not.
Ret2libc or ROP is a technique you deploy when you cannot return to your shellcode (jmp esp/whatever) because of memory protections (DEP/NX).
They perform different goals, and you may engage with both.
Return to libc
This is a buffer overrun exploit where you have control over some input, and you know there exists a badly written program which will copy your input onto the stack and return through it. This allows you to start executing code on a machine you have not got a login to, or a very restrictive access.
Return to libc gives you the ability to control what code is executed, and would generally result in you running within the space, a more natural piece of code.
return from kernel
This is a privilege escalation attack. It takes code which is running on the machine, and exploits bugs in the kernel causing the kernel privileges to be applied to the user process.
Thus you may use return-to-libc to get your code running in a web-browser, and then return-to-user to perform restricted tasks.
Earlier shellcode
Before return-to-libc, there was straight shellcode, where the buffer overrun included the exploit code, with knowledge about where the stack would be, this was possible to run directly. This became somewhat obsolete with the NX bit in windows. (x86 hardware was capable of having execute on a segment, but not a page).
Why is return-to-libc much earlier than return-to-user?
The goal in these attacks is to own the machine, frequently, it was easy to find a vulnerability in a user-mode process which gave you access to what you needed, it is only with the hardening of the system by reducing the privilege at the boundary, and fixing the bugs in important programs, that the new return-from-kernel became necessary.
I have been writing a small kernel lately based on x86-64 architecture. When taking care of some user space code, I realized I am virtually not using rbp. I then looked up at some other things and found out compilers are getting smarter these days and they really dont use rbp anymore. (I could be wrong here.)
I was wondering if conventional use of rbp/epb is not required anymore in many instances or am I wrong here. If that kind of usage is not required then can it be used like a general purpose register?
Thanks
It is only needed if you have variable-length arrays in your stack frames (recording the array length would require more memory and more computations). It is no longer needed for unwinding because there is now metadata for that.
It is still useful if you are hand-writing entire assembly functions, but who does that? Assembly should only be used as glue to jump into a C (or whatever) function.
When studying Java I learned that Strings were not safe for storing passwords, since you can't manually clear the memory associated with them (you can't be sure they will eventually be gc'ed, interned strings may never be, and even after gc you can't be sure the physical memory contents were really wiped). Instead, I were to use char arrays, so I can zero-out them after use. I've tried to search for similar practices in other languages and platforms, but so far I couldn't find the relevant info (usually all I see are code examples of passwords stored in strings with no mention of any security issue).
I'm particularly interested in the situation with browsers. I use jQuery a lot, and my usual approach is just the set the value of a password field to an empty string and forget about it:
$(myPasswordField).val("");
But I'm not 100% convinced it is enough. I also have no idea whether or not the strings used for intermediate access are safe (for instance, when I use $.ajax to send the password to the server). As for other languages, usually I see no mention of this issue (another language I'm interested in particular is Python).
I know questions attempting to build lists are controversial, but since this deals with a common security issue that is largely overlooked, IMHO it's worth it. If I'm mistaken, I'd be happy to know just from JavaScript (in browsers) and Python then. I was also unsure whether to ask here, at security.SE or at programmers.SE, but since it involves the actual code to safely perform the task (not a conceptual question) I believe this site is the best option.
Note: in low-level languages, or languages that unambiguously support characters as primitive types, the answer should be obvious (Edit: not really obvious, as #Gabe showed in his answer below). I'm asking for those high level languages in which "everything is an object" or something like that, and also for those that perform automatic string interning behind the scenes (so you may create a security hole without realizing it, even if you're reasonably careful).
Update: according to an answer in a related question, even using char[] in Java is not guaranteed to be bulletproof (or .NET SecureString, for that matter), since the gc might move the array around so its contents might stick in the memory even after clearing (SecureString at least sticks in the same RAM address, guaranteeing clearing, but its consumers/producers might still leave traces).
I guess #NiklasB. is right, even though the vulnerability exists, the likelyhood of an exploit is low and the difficulty to prevent it is high, that might be the reason this issue is mostly ignored. I wish I could find at least some reference of this problem concerning browsers, but googling for it has been fruitless so far (does this scenario at least have a name?).
The .NET solution to this is SecureString.
A SecureString object is similar to a String object in that it has a text value. However, the value of a SecureString object is automatically encrypted, can be modified until your application marks it as read-only, and can be deleted from computer memory by either your application or the .NET Framework garbage collector.
Note that even for low-level languages like C, the answer isn't as obvious as it seems. Modern compilers can determine that you are writing to the string (zeroing it out) but never reading the values you read out, and just optimize away the zeroing. In order to prevent optimizing away the security, Windows provides SecureZeroMemory.
For Python, there's no way to do that that, according to this answer. A possibility would be using lists of characters (as length-1 strings or maybe code units as integers) instead of strings, so you can overwrite that list after use, but that would require every code that touches it to support this format (if even a single one of them creates a string with its contents, it's over).
There is also a mention to a method using ctypes, but the link is broken, so I'm unaware of its contents. This other answer also refers to it, but there's not a lot of detail.
I've heard the theory. Address Space Location Randomization takes libraries and loads them at randomized locations in the virtual address space, so that in case a hacker finds a hole in your program, he doesn't have a pre-known address to execute a return-to-libc attack against, for example. But after thinking about it for a few seconds, it doesn't make any sense as a defensive measure.
Let's say that our hypothetical TargetLib (libc or anything else the hacker is looking for) is loaded at a randomized address instead of a deterministic one. Now the hacker doesn't know ahead of time where TargetLib and the routines inside it are, but neither does the application code. It needs to have some sort of lookup table somewhere in the binary in order to find the routines inside of TargetLib, and that has to be at a deterministic location. (Or at a random location, pointed to by something else. You can add as many indirections as you want, but eventually you have to start at a known location.)
This means that instead of pointing his attack code at the known location of TargetLib, all the hacker needs to do is point his attack code at the application's lookup table's entry for TargetLib and dereference the pointer to the target routine, and the attack proceeds unimpeded.
Is there something about the way ASLR works that I don't understand? Because as described, I don't see how it's anything more than a speed bump, providing the image of security but no actual substance. Am I missing something?
I believe that this is effective because it changes the base address of the shared library. Recall that imported functions from a shared library are patched into your executable image when it is loaded, and therefore there is no table per se, just specific addresses pointing at data and code scattered throughout the program's code.
It raises the bar for an effective attack because it makes a simple buffer overrun (where the return address on the stack can be set) into one where the overrun must contain the code to determine the correct location and then jmp to it. Presumably this just makes it harder.
Virtually all DLLs in Windows are compiled for a base address that they will likely not run at and will be moved anyway, but the core Windows ones tend to have their base address optimized so that the relocation is not needed.
I don't know if get you question correctly but I'll explain when ASLR is effective and when not.
Let's say that we have app.exe and TargetLib.dll.
app.exe is using(linked to) TargetLib.dll.
To make the explanation simple, let's assume that the virtual address space only has these 2 modules.
If both are ALSR enabled, app.exe's base address is unknown. It may resolve some function call addresses when it is loaded but an attacker knows neither where the function is nor where the resolved variables are. The same thing happens when TargetLib.dll is loaded.
Even though app.exe has a lookup table, an attacker does not know where the table is.
Since an attacker cannot tell what is the content of specific address he must attack the application without using any fixed address information. It is usually harder if he uses usual attacking method like stack overflow, heap overflow, use-after-free...
On the other hand, if app.exe is NOT ASLR enabled, it is much easier for an attacker to exploit the application. Because there may be a function call to a interesting API at specific address in app.exe and the attacker can use the address as a target address to jump. (Attacking an application usually starts from jumping to arbitrary address.).
Supplementation:
You may already understand it but I want to make one thing clear.
When an attacker exploit an application by vulnerability like memory corruption he is usually forced to usefixed address jump instruction. They cannot use relative address jump instruction to exploit. This is the reason why ALSR is really effective to such exploits.