I need to do some brute-force searching in process VA space for my study and hence would like limit my heap area's virtual address range. OS course told me that heap is anywhere between data and stack pages. So I want to shrink my process VA range by doing the following:
Have a custom linker script that gave start and end of data somewhere very high in address range (0x7f45f88a6000)
Tweak fs/binfmt_elf.c to have stack top as (0x8f45f88a6000) instead of randomly picking.
Assume my program uses only mmap with NULL as addresses
Can I safely assume that my heap(brk) will be between this address range. Also can I assume all mmap(NULL, other args) calls will return between this address range?
If not what is the fix for this? I am willing to change kernel source code, but where?
Related
Does the Linux kernel allocate one big range, and let the heap and stack grow in opposite directions at the start/end adresses of this range, so that if you know the adress of one of them you know the other, or are the two areas independant ?
No, it does not, for security reasons. If they were related in any way it would be a severe security flaw.
On the other hand, libc is the one responsible of allocating this two areas using mmap(), not kernel... or at least not directly. Each area has its own call to mmap() (including libraries), and the kernel gives partially random addresses for each call.
You can see in /proc/$pid/maps the different areas allocated for a specific program with $pid as process id.
Currently, I'm trying to figure out how to get the virtual address (VA) of a specific process in the Linux kernel, since there are several functions taking VA as an argument related to different page directories, including pgd_offset(), pgd_index(), p4d_offset(), p4d_index()...
Could anyone explain the functions of these functions, including xxx_offset(), xxx_index()?(xxx:pgd, p4d, pmd...) And how to use these functions?
What does the VA mean when it is taken as an argument of functions mentioned above, is that the virtual address of the process? And how can I get the VA of a specific process? I've already known that we can use process's task_struct->mm->mmap to get the range of the virtual address space, but no idea about how to get a specific virtual address.
Is the task_struct->mm->pgd_t indicating the base address of PGD_directory?
Your question doesn't really make sense. You don't "get a virtual address of a process". A process has a virtual address space that serves as a virtual memory map for data, code, stack, heap, etc.
Those functions are taking a single virtual address within the process virtual address space and helping with walking through the page tables to find its page table entry and then its physical address (or checking page table entry flags). In Linux, there are 4 page tables levels to go through to get to the page table entry. Normally the levels are pgd (page table directory), pud (page upper directory), pmd (page mid directory), and pte (page table entry). But I think recently p4d was added as an extra page table level. Typically, the address of the page directory (top-level page table) is stored in the CR3 register. So you use that address to access the directory, then use the pgd_index and pgd_offset to find the address of the next level (p4d) you need to look into, and repeat till you hit the pte. A useful file to see this in action is the mm/page_walk.c file.
A process accesses memory during its runtime and typically this memory is referred to by virtual addresses. When it accesses an address that isn't in the TLB, the address must be walked through as described above to find out its location and permissions flags. There is no "getting the VA of a process", but when your program uses mmap or malloc and you get addresses of variables, those addresses are typically virtual addresses. You can look in /proc/proc_number/maps to see the virtual address layout of a process with PID proc_number. Note that with address space layout randomization turned on, this map will be different every time you run the same program.
I'm not sure, but you probably can test it by comparing that variable with the pgd address used in the page_walk.c file I linked above.
Disclaimer: I am not a very experienced guy, and many questions might seem stupid or badly phrased.
I have heard about stacks and heaps and read a bit about them, but still a few things I don't quite understand:
How does a program find empty memory to store new variables/objects in physical memory.
How does a program know where an object starts and where an object ends in memory. With number variables I can imagine there is a few extra information provided in memory that show the porgram how many bits the variable occupies, but correct me if I'm wrong.
This is similar to my first question, but: when a variable has a value representd only by zeros, how does the program not confuse that with free memory.
Does the object value null mean that the address of an object is a bunch of 0's or does the object point to litterally nothing? And if so, how is the "reference" stored to assign it an address later on?
How does a program find empty memory to store new variables/objects in physical memory.
Modern operating systems use logical address translation. A process sees a range of logical addresses—its address space. The system hardware breaks the address range into pages. The size of the page is system dependent and is often configurable. The operating system manages page tables that map logical pages to physical page frames of the same size.
The address space is divided into a range of pages that is the system space, shared by all processes, and a user space, that is generally unique to each process.
Within the user and system spaces, pages may be valid or invalid. An invalid page has not yet been mapped to the process address space. Most pages are likely to be invalid.
Memory is always allocated from the operating system image pages. The operating system will have system services that transform invalid pages into valid pages with mappings to physical memory. In order to map pages, the operating system needs to find (or the application needs to specify) a range of pages that are invalid and then has to allocate physical page frames to map to the those pages. Note that physical page frames do not have to be mapped contiguously to logical pages.
You mention stacks and heaps. Stacks and heap are just memory. The operating system cannot tell whether memory is a stack, heap or something else. User mode libraries for memory allocation (such as those that implement malloc/free) allocate memory in pages to create heaps. The only thing that makes this memory a heap is that there is a heap manager controlling it. The heap manager can then allocate smaller blocks of memory from the pages allocated to the heap.
A stack is simpler. It is just a contiguous range of pages. Typically an operating system service that creates a thread or process will allocate a range of pages for a stack and assign the hardware stack pointer register to the high end of the stack range.
How does a program know where an object starts and where an object ends in memory. With number variables I can imagine there is a few extra information provided in memory that show the porgram how many bits the variable occupies, but correct me if I'm wrong.
This depends upon how the program is created and how the object is created in memory. For typed languages, the linker binds variables to addresses. The linker also generates instruction for mapping those addresses to the address space. For stack/auto variables, the compiler generates offsets from a pointer to the stack. When a function/subroutine gets called, the compiler generates code to allocate the memory required by the procedure, which it does by simply subtracting from the stack pointer. The memory gets freed by simply adding that value back to the stack pointer.
In the case of typeless languages, such as assembly language or Bliss, the programmer has to keep track of the type for each location. When memory is dynamically, the programmer also has to keep track of the type. Most programming languages help this out by having pointers with types.
This is similar to my first question, but: when a variable has a value representd only by zeros, how does the program not confuse that with free memory.
Free memory is invalid. Accessing free memory causes a hardware exception.
Does the object value null mean that the address of an object is a bunch of 0's or does the object point to litterally nothing? And if so, how is the "reference" stored to assign it an address later on?
The linker defines the initial state of a program's user address space. Most linkers do not map the first page (or even more than one page). That page is then invalid. That means a null pointer, as you say, references absolutely nothing. If you try to dereference a null pointer you will usually get some kind of access violation exception
Most operating system will allow the user to map the first page. Some linkers will allow the user to override the default setting and map the first page. This is not commonly done as it makes detecting memory error difficult.
How does a program find empty memory to store new variables/objects in physical memory.
Physical memory is managed by the OS that knows which parts of the memory are used by processes and which parts are free. When it needs memory, a program asks the operating system to use parts of the memory. If this memory is for the heap, extra operations are needed. The operating systems delivers memory by fixed size blocks called pages. As a page is 4kbytes, if the user mallocs some bytes, there is a need, to optimize memory use, to know which parts of the page are used or available and to monitor page content after successive malloc and free. There are specific data structures to describe used space and algorithms to find space, whilst avoiding fragmentation.
How does a program know where an object starts and where an object ends in memory. With number variables I can imagine there is a few extra information provided in memory that show the porgram how many bits the variable occupies, but correct me if I'm wrong
The program knows the address (ie the start) of every variable. For global or static variables it is generated by the linker when it places vars in memory. For local variables, the processor has means to compute it given the stack position. For allocated variables, it is stored in another variable (a pointer) when memory is allocated. Concerning the end, it depends on the type of variables. For known types (like int) or composition of known types (like structs) it can be computed at compile time. In other situations, the program has no way to know the entity size. For instance a declaration like int * a may describe an array, but the program has no way to know the array size. The programmer must keep track of this information, for instance by writing the number of elements in the array in another variable.
This is similar to my first question, but: when a variable has a value representd only by zeros, how does the program not confuse that with free memory.
The program never looks at the memory to know if it is free or not. It managed by other means (see question 1).
Does the object value null mean that the address of an object is a bunch of 0's or does the object point to litterally nothing? And if so, how is the "reference" stored to assign it an address later on?
An address is never a bunch of zero, except for address '0' of memory. It is the content that is set to zero. Actually, it not possible to read or write address 0. It generates a "bus error" exception (and maybe you have already encountered it). Pointing to a zero address is exactly like "pointing to litterally nothing" and generate an error if encountered in a program. These variables hold addresses of other vars (pointer). So the address of the pointer is well defined. Was may not be defined is what it points to. It can be modified by assigning something to the pointer (for instance what malloc returned or the address of another var).
I have been learning about virtual memory recently, and some questions were raised - especially regarding the initialization of all the structs.
assume x86 architecture, linux 2.4 (=> 2 level paging).
at the beginning, what do the PGD's entries contain, if they dont point to any allocated Page Table?
same question for page tables - how are the entries initialized?
when process creates new memory area, say ,for virtual addresses 100 - 200, does it also create (if needed) and initialize the page tables that correspond to those addresses? or wait until there is an access for a specific address?
when page table entry needs to be initialized to physical address (say on write access) - how does OS select it?
thanks in advance.
Entries have a valid bit. So if there are no page tables allocated in a page directory, all entries theoretically have the valid bit off and it wouldn't matter what else is in the entry.
Same as above, except I think if a page table is created that means a page from this range has been accessed so at least one entry will be set as valid upon page table initialization. Otherwise, there'd be no reason to create an empty page table at all and take up memory.
I'm interpreting your "creating new memory area" as using a malloc() call. Malloc is a way to ask the OS for memory and to map that memory to your virtual address space. That memory comes from your heap virtual memory range and I don't think you can guarantee the specific addresses the OS uses, just the size. If you use mmap I think you do have the ability to ask for specific addresses to be used, but in general you only want to do this for specific cases like shared memory.
As for the page tables, I imagine that when the OS gets your memory during a malloc call, it will update your page table for you with the new mappings. If it doesn't during the malloc, then it will when you try to access the memory and it causes a page fault.
In Linux, the OS generally keeps track of a free list of pages so that it can easily grab memory without worrying about anyone else using it. My guess is the free list is initialized upon booting by communicating with the main memory controller/bitmap to know which spots of physical memory are in use, but maybe a hardware person can back this up.
When executed, program will start running from virtual address 0x80482c0. This address doesn't point to our main() procedure, but to a procedure named _start which is created by the linker.
My Google research so far just led me to some (vague) historical speculations like this:
There is folklore that 0x08048000 once was STACK_TOP (that is, the stack grew downwards from near 0x08048000 towards 0) on a port of *NIX to i386 that was promulgated by a group from Santa Cruz, California. This was when 128MB of RAM was expensive, and 4GB of RAM was unthinkable.
Can anyone confirm/deny this?
As Mads pointed out, in order to catch most accesses through null pointers, Unix-like systems tend to make the page at address zero "unmapped". Thus, accesses immediately trigger a CPU exception, in other words a segfault. This is quite better than letting the application go rogue. The exception vector table, however, can be at any address, at least on x86 processors (there is a special register for that, loaded with the lidt opcode).
The starting point address is part of a set of conventions which describe how memory is laid out. The linker, when it produces an executable binary, must know these conventions, so they are not likely to change. Basically, for Linux, the memory layout conventions are inherited from the very first versions of Linux, in the early 90's. A process must have access to several areas:
The code must be in a range which includes the starting point.
There must be a stack.
There must be a heap, with a limit which is increased with the brk() and sbrk() system calls.
There must be some room for mmap() system calls, including shared library loading.
Nowadays, the heap, where malloc() goes, is backed by mmap() calls which obtain chunks of memory at whatever address the kernel sees fit. But in older times, Linux was like previous Unix-like systems, and its heap required a big area in one uninterrupted chunk, which could grow towards increasing addresses. So whatever was the convention, it had to stuff code and stack towards low addresses, and give every chunk of the address space after a given point to the heap.
But there is also the stack, which is usually quite small but could grow quite dramatically in some occasions. The stack grows down, and when the stack is full, we really want the process to predictably crash rather than overwriting some data. So there had to be a wide area for the stack, with, at the low end of that area, an unmapped page. And lo! There is an unmapped page at address zero, to catch null pointer dereferences. Hence it was defined that the stack would get the first 128 MB of address space, except for the first page. This means that the code had to go after those 128 MB, at an address similar to 0x080xxxxx.
As Michael points out, "losing" 128 MB of address space was no big deal because the address space was very large with regards to what could be actually used. At that time, the Linux kernel was limiting the address space for a single process to 1 GB, over a maximum of 4 GB allowed by the hardware, and that was not considered to be a big issue.
Why not start at address 0x0? There's at least two reasons for this:
Because address zero is famously known as a NULL pointer, and used by programming languages to sane check pointers. You can't use an address value for that, if you're going to execute code there.
The actual contents at address 0 is often (but not always) the exception vector table, and is hence not accessible in non-privileged modes. Consult the documentation of your specific architecture.
As for the entrypoint _start vs main:
If you link against the C runtime (the C standard libraries), the library wraps the function named main, so it can initialize the environment before main is called. On Linux, these are the argc and argv parameters to the application, the env variables, and probably some synchronization primitives and locks. It also makes sure that returning from main passes on the status code, and calls the _exit function, which terminates the process.