Stack location in Linux with ASLR - linux

In Linux, with ASLR enabled, is there a range of addresses where user stack address lies? What about heap, instruction addresses(text section)?
In general, is it possible to look at an address and tell if it is for data or for code?
Edit:
I am trying to write a Pintool that looks at the EIP after a return and checks if the EIP points to a data area. Let's assume that NX is not enabled on this system.
For some reason, this was downvoted. Fortunately, the answer can be found here:
https://security.stackexchange.com/questions/185315/stack-location-range-on-linux-for-user-process/185330#185330

cat /proc/self/maps will show the initial location of the main thread's stack. This can be inaccurate for (at least) the following reasons:
you're not in the main thread
any part of the program was built with the -fsplit-stack option, or you call a library that does something similar
you're within a signal handler that requests the sigaltstack stack instead
you do weird alloca tricks like CHICKEN Scheme does to use the stack as a heap
...
Also note that the general areas are not fully random. See the AddressSanitizer project for something that takes advantage of this.

Related

How do different commands get executed in CPU x86-64 registers?

Years ago a teacher once said to class that 'everything that gets parsed through the CPU can also be exploited'.
Back then I didn't know too much about the topic, but now the statement is nagging on me and I
lack the correct vocabulary to find an answer to this question in the internet myself, so I kindly ask you for help.
We had the lesson about 'cat', 'grep' and 'less' and she said that in the worst case even those commands can cause harm if we parse the wrong content through it.
I don't really understand how she meant that. I do know how CPU registers work, we also had to write an educational buffer overflow so I have seen assembly code in the registers aswell.
I still don't get the following:
How do commands get executed in the CPU at all? e.g. I use 'cat' so somehwere there will be a call of the command. But how does the data I enter get parsed to the CPU? If I 'cat' a .txt file which contains 'hello world' - can I find that string in HEX somewhere in the CPU registers? And if yes:
How does the CPU know that said string is NOT to be executed?
Could you think of any scencario where the above commands could get exploited? Afaik only text gets parsed through it, how could that be exploitable? What do I have to be careful about?
Thanks alot!
Machine code executes by being fetched by the instruction-fetch part of the CPU, at the address pointed to by RIP, the instruction-pointer. CPUs can only execute machine code from memory.
General-purpose registers get loaded with data from data load/store instructions, like mov eax, [rdi]. Having data in registers is totally unrelated to having it execute as machine code. Remember that RIP is a pointer, not actual machine-code bytes. (RIP can be set with jump instructions, including indirect jump to copy a GP register into it, or ret to pop the stack into it).
It would help to learn some basics of assembly language, because you seem to be missing some key concepts there. It's kind of hard to answer the security part of this question when the entire premise seems to be built on some misunderstanding of how computers work. (Which I don't think I can easily clear up here without writing a book on assembly language.) All I can really do is point you at CPU-architecture stuff that answers part of the title question of how instructions get executed. (Not from registers).
Related:
How does a computer distinguish between Data and Instructions?
How instructions are differentiated from data?
Modern Microprocessors
A 90-Minute Guide! covers the basic fetch/decode/execute cycle of simple pipelines. Modern CPUs might have more complex internals, but from a correctness / security POV are equivalent. (Except for exploits like Spectre and Meltdown that depend on speculative execution).
https://www.realworldtech.com/sandy-bridge/3/ is a deep-dive on Intel's Sandybridge microarchitecture. That page covering instruction-fetch shows how things really work under the hood in real CPUs. (AMD Zen is fairly similar.)
You keep using the word "parse", but I think you just mean "pass". You don't "parse content through" something, but you can "pass content through". Anyway no, cat usually doesn't involve copying or looking-at data in user-space, unless you run cat -n to add line numbers.
See Race condition when piping through x86-64 assembly program for an x86-64 Linux asm implementation of plain cat using read and write system calls. Nothing in it is data-dependent, except for the command-line arg. The data being copied is never loaded into CPU registers in user-space.
Inside the kernel, copy_to_user inside Linux's implementation of a read() system call on x86-64 will normally use rep movsb for the copy, not a loop with separate load/store, so even in kernel the data gets copied from the page-cache, pipe buffer, or whatever, to user-space without actually being in a register. (Same for write copying it to whatever stdout is connected to.)
Other commands, like less and grep, would load data into registers, but that doesn't directly introduce any risk of it being executed as code.
Most of the things have already been answered by Peter. However i would like to add a few things.
How do commands get executed in the CPU at all? e.g. I use 'cat' so somehwere there will be a call of the command. But how does the data I enter get parsed to the CPU? If I 'cat' a .txt file which contains 'hello world' - can I find that string in HEX somewhere in the CPU registers?
cat is not directly executed by the CPU cat.c. You could check the source code and get and in-depth view. .
What actually happens is that each instruction is converted to assembly instruction and they get executed by the CPU. The instructions are not vulnerable because what they do is just move some data and switch some bits. Most of the vulnerability are due to memory management and cat has been vulnerable in the past Check this for more detail
How does the CPU know that said string is NOT to be executed?
It does not. Its the job of the operating system to tell what is to be executed and what not.
Could you think of any scencario where the above commands could get exploited? Afaik only text gets parsed through it, how could that be exploitable? What do I have to be careful about?
You have to be careful about how you are passing the text file to the memory. You could even make your own interpreter that would execute txt file and then the interpreter will be telling the CPU about how to execute that instruction.

Is IO with O_DIRECT less safe for lacking of address checking?

Because as mentioned in book Understanding The Linux Kernel, normal IO (without O_DIRECT) would do data copy by call function of copy_to_user() or copy_from_user(). So beside access_ok(), a more complete user address parameter cheching is done by a mechanism named (by the book) as “Dynamic Address Checking: The Fix-up Code” , which is as bellow:
the PC address of the exact copy instruction inside copy_to_user() or
copy_from_user() are set into an __ex_table, When there is a wrong
user address input through the system call, the page fault handler
would do bellow check so system call is more safe:
———————————————————————
if PC is in __ex_table then just kill the user program
else kernel panic
———————————————————————
My question is since the O_DIRECT mostly does not do user/kernel copy, so it might be less safe from kernel perspective because its lack of above dynamic address checking?
Thanks for help.
Edit:
Per Tsyvarev's reply, "less safe" might be not much precise, my question can be further elaberated as:
When a program do direct I/O and it provides wrong address, then which cases of bellow will happen:
the program will be killed (since program without O_DIRECT will be killed, so if program with O_DIRECT will not get killed, I considered it is not as safe in my upper question )
or will the kernel just panic
or do nothing
Maybe my question is too code specific, and I need do some experiments and code reading to find answer myself :)
thanks

Can anyone explain why NO-OP slide is used in shelllcoding?

An example where NO-OP slide is a must for the exploit to work would be really helpful.
An example of when it is a must is when you want an exploit to be portable when targeting a non-ASLR enabled executable/system. Consider a local privilege escalation exploit where you return to shellcode on the stack. Because the stack holds the environment, the shellcode on the stack will be at slightly different offsets from the top of the stack when executing from within different users' shells, or on different systems. By prefixing the shellcode with, for example, 64k nop instructions, you provide a large margin of error for the stack address since your code will execute the same whether you land on the first nop or the last one.
Using nops is generally not as useful when targeting ASLR enabled systems since data sections will be mapped in entirely different areas of memory

Understanding GDB and Segfault Messages

I was recently debugging an application that was segfaulting on a regular basis--I solved the problem, which was relatively mundane (reading from a null pointer), but I have a few residual questions I've been unable to solve on my own.
The gdb stack trace began like this in most cases:
0x00007fdff330059f in __strlen_sse42 () from /lib64/libc.so.6
Using information from /proc/[my proc id]/maps to attain the base address of the shared library, I could see that the problem occurred at the same instruction of the shared library--at instruction 0x13259f, which is
pcmpeqb (%rdi),%xmm1 (gdb)
So far, so good. But then, the OS (linux) would also write out an error message to /var/logs/messags, that looks like this
[3540502.783205] node[24638]: segfault at 0 ip 00007f8abbe6459f sp 00007fff7bf2f148 error 4 in libc-2.12.so[7f8abbd32000+189000]
Which confuses me. On the one hand, the kernel correctly identifies the fault (a user-mode protection fault), and, by subtracting the base address of the shared library from the instruction pointer, we arrive at the same relative offset--0x13259f--as we do by gdb. But the library the kernel identifies is different, the address of the instruction is different, and the function and instruction within that library is different. That is, the instruction within libc-2-12.so is
0x13259f <__memset_sse2+911>: movdqa %xmm0,-0x43(%edx)
So, my question is, how can gdb and the kernel message agree on the type of fault, and on the offset of the instruction relative to the base address of the shared library, but disagree on the address of the instruction pointer and the shared library being used?
But the library the kernel identifies is different,
No, it isn't. Do ls -l /lib64/libc.so.6, and you'll see that it's a symlink to libc-2.12.so.
the address of the instruction is different
The kernel message is for a different execution from the one you've observed in GDB, and address randomization caused libc-2.12.so to be loaded at a different base address.
and the function and instruction within that library is different. That is, the instruction within libc-2-12.so is 0x13259f <__memset_sse2+911>: movdqa %xmm0,-0x43(%edx)
It is likely that you looked at a different libc-2.12.so from the one that is actually used.

Why does the stack have to be page aligned?

In Linux, I've tried (just for fun) to modify the kernel source in process.c create a stack address that has more entropy, i.e. in particular the line:
sp -= get_random_int() % 8192;
When I change this too much, the kernel halts or I get some seemingly undefined behavior. I'm guessing that this causes PAGE_ALIGN() to fail in some way? I'm not that interested in why PAGE_ALIGN() in particular fails, or exactly what piece of code in the kernel that fails (although that too would be nice to know); I'm more interested in why the stack must reside in a particular region at all. What is the architectural reason and motivation behind this? Does this have something to do with how GDT/LDT works in protected mode?
Just to make clear what I'm asking:
Why does the stack have to have the form 0xbfXXXXXX (on 32-bit)? Why cannot the stack be e.g. 0xaaXXXXXX, or any other value?
There is a limit in do_page_fault() as to how far outside the stack vma you can be before it considers it a bad access, perhaps you're hitting that?

Resources