stack memory allocation - multithreading

I had a debate with friend who said that in C, the memory is allocated in compile time.
I said it cannot be true since memory can be allocated only when the process loads to memory.
From this to that we started talking about stack allocation, after reading, I know its OS implementation,
But usually the stack start in default size, 1 MB (sequential) lets say, of reserved Virtual address, and when the function is called (no matter where we are on the stack of that thread),
A block of bytes in size X (all local variable for that function?) is commited (allocated) from the reserved address.
What I would like to understand, When we enter the function, How does the OS knows how big is the size to allocate (commit from reserved)?
Does it happen dynamically during the function execution ? or the compiler knows to calculate before each function what size this function needs ?

The compiler can and does (at compile time) count up the sizes of all the local variables that the function declares on the stack, and thereby it knows (again, at compile time) how much it will need to increase the stack-pointer when the function is entered (and decrease it when the function returns). This computed amount-to-increase-the-stack-pointer-by value will be written into the executable code directly at compile time, so that it doesn't need to be re-computed when the function is called at run time.
The exception to the above is when your C program is using C99s's variable-length-arrays (VLA) feature. As suggested by the name, variable-length-arrays are arrays whose size is not known until run-time, and as such the compiler will have to emit special code for any function that contains one or more VLAs, such that the amount by which to increase the stack-pointer is calculated at run-time.
Note that the physical act of mapping virtual stack addresses to physical RAM addresses (and making sure that the necessary RAM is allocated) is done at run time, and is handled by the operating system, not by the compiler. In particular, if a process tries to access a virtual address (on the stack or otherwise) that is not currently mapped to any physical address, a page fault will be generated by the MMU. The process's executation will be temporarily paused while a page-fault-handler routine executes. The page-fault-handler will evaluate the legality of the virtual address the process tried to access; if the virtual address was a legal one, the page-fault-handler will map it to an appropriate page of physical RAM, and then let the process continue executing. If the virtual address was not one the process is allowed to access (or if the attempt to procure a page of physical RAM failed, e.g. because the computer's memory is full), then the mapping will fail and the OS will halt/crash the process.

Related

How does the referencing of objects and variables in programs work?

Disclaimer: I am not a very experienced guy, and many questions might seem stupid or badly phrased.
I have heard about stacks and heaps and read a bit about them, but still a few things I don't quite understand:
How does a program find empty memory to store new variables/objects in physical memory.
How does a program know where an object starts and where an object ends in memory. With number variables I can imagine there is a few extra information provided in memory that show the porgram how many bits the variable occupies, but correct me if I'm wrong.
This is similar to my first question, but: when a variable has a value representd only by zeros, how does the program not confuse that with free memory.
Does the object value null mean that the address of an object is a bunch of 0's or does the object point to litterally nothing? And if so, how is the "reference" stored to assign it an address later on?
How does a program find empty memory to store new variables/objects in physical memory.
Modern operating systems use logical address translation. A process sees a range of logical addresses—its address space. The system hardware breaks the address range into pages. The size of the page is system dependent and is often configurable. The operating system manages page tables that map logical pages to physical page frames of the same size.
The address space is divided into a range of pages that is the system space, shared by all processes, and a user space, that is generally unique to each process.
Within the user and system spaces, pages may be valid or invalid. An invalid page has not yet been mapped to the process address space. Most pages are likely to be invalid.
Memory is always allocated from the operating system image pages. The operating system will have system services that transform invalid pages into valid pages with mappings to physical memory. In order to map pages, the operating system needs to find (or the application needs to specify) a range of pages that are invalid and then has to allocate physical page frames to map to the those pages. Note that physical page frames do not have to be mapped contiguously to logical pages.
You mention stacks and heaps. Stacks and heap are just memory. The operating system cannot tell whether memory is a stack, heap or something else. User mode libraries for memory allocation (such as those that implement malloc/free) allocate memory in pages to create heaps. The only thing that makes this memory a heap is that there is a heap manager controlling it. The heap manager can then allocate smaller blocks of memory from the pages allocated to the heap.
A stack is simpler. It is just a contiguous range of pages. Typically an operating system service that creates a thread or process will allocate a range of pages for a stack and assign the hardware stack pointer register to the high end of the stack range.
How does a program know where an object starts and where an object ends in memory. With number variables I can imagine there is a few extra information provided in memory that show the porgram how many bits the variable occupies, but correct me if I'm wrong.
This depends upon how the program is created and how the object is created in memory. For typed languages, the linker binds variables to addresses. The linker also generates instruction for mapping those addresses to the address space. For stack/auto variables, the compiler generates offsets from a pointer to the stack. When a function/subroutine gets called, the compiler generates code to allocate the memory required by the procedure, which it does by simply subtracting from the stack pointer. The memory gets freed by simply adding that value back to the stack pointer.
In the case of typeless languages, such as assembly language or Bliss, the programmer has to keep track of the type for each location. When memory is dynamically, the programmer also has to keep track of the type. Most programming languages help this out by having pointers with types.
This is similar to my first question, but: when a variable has a value representd only by zeros, how does the program not confuse that with free memory.
Free memory is invalid. Accessing free memory causes a hardware exception.
Does the object value null mean that the address of an object is a bunch of 0's or does the object point to litterally nothing? And if so, how is the "reference" stored to assign it an address later on?
The linker defines the initial state of a program's user address space. Most linkers do not map the first page (or even more than one page). That page is then invalid. That means a null pointer, as you say, references absolutely nothing. If you try to dereference a null pointer you will usually get some kind of access violation exception
Most operating system will allow the user to map the first page. Some linkers will allow the user to override the default setting and map the first page. This is not commonly done as it makes detecting memory error difficult.
How does a program find empty memory to store new variables/objects in physical memory.
Physical memory is managed by the OS that knows which parts of the memory are used by processes and which parts are free. When it needs memory, a program asks the operating system to use parts of the memory. If this memory is for the heap, extra operations are needed. The operating systems delivers memory by fixed size blocks called pages. As a page is 4kbytes, if the user mallocs some bytes, there is a need, to optimize memory use, to know which parts of the page are used or available and to monitor page content after successive malloc and free. There are specific data structures to describe used space and algorithms to find space, whilst avoiding fragmentation.
How does a program know where an object starts and where an object ends in memory. With number variables I can imagine there is a few extra information provided in memory that show the porgram how many bits the variable occupies, but correct me if I'm wrong
The program knows the address (ie the start) of every variable. For global or static variables it is generated by the linker when it places vars in memory. For local variables, the processor has means to compute it given the stack position. For allocated variables, it is stored in another variable (a pointer) when memory is allocated. Concerning the end, it depends on the type of variables. For known types (like int) or composition of known types (like structs) it can be computed at compile time. In other situations, the program has no way to know the entity size. For instance a declaration like int * a may describe an array, but the program has no way to know the array size. The programmer must keep track of this information, for instance by writing the number of elements in the array in another variable.
This is similar to my first question, but: when a variable has a value representd only by zeros, how does the program not confuse that with free memory.
The program never looks at the memory to know if it is free or not. It managed by other means (see question 1).
Does the object value null mean that the address of an object is a bunch of 0's or does the object point to litterally nothing? And if so, how is the "reference" stored to assign it an address later on?
An address is never a bunch of zero, except for address '0' of memory. It is the content that is set to zero. Actually, it not possible to read or write address 0. It generates a "bus error" exception (and maybe you have already encountered it). Pointing to a zero address is exactly like "pointing to litterally nothing" and generate an error if encountered in a program. These variables hold addresses of other vars (pointer). So the address of the pointer is well defined. Was may not be defined is what it points to. It can be modified by assigning something to the pointer (for instance what malloc returned or the address of another var).

Can a process start executing with 0 Pages in main memory?

I am reading a Book on Operating System (Galvin). While explaining Demand Paging it says
In the extreme case, we can start executing a process with no pages in
memory. When the operating system sets the instruction pointer to the first
instruction of the process, which is on a non-memory-resident page, the process
immediately faults for the page.
My question is how OS can set the instruction pointer for a process for which not even single page is in memory (because the address in instruction pointer cannot be a disc or secondary memory address, It has to be a main memory address but 0 pages means nothing is in memory).
That's what virtual memory is. It means that there's an ephemeral mapping between logical addresses, which are known and constant, and physical addresses, which are transient. The normal level of processing then works purely in logical addresses, without necessarily having any knowledge of what's going on physically.
So the OS would e.g. say that the binary A is logically available at address N. It will then mark in the virtual map that the pages covering N to N+(size of binary) are currently faults. Having set the PC to N (or whatever the entry point is), the MMU will fire a fault as soon as the CPU tries to read from the PC. At that point the paging mechanism will catch the fault and do the usual things.

Virtual memory sections and memory mapping area

As process has virtual memory which is copied into RAM during run time. As given in the previous post.
Which part of process virtual memory layout does mmap() uses?
I have following doubles :
If memory mapping is inside unallocated memory and it is inside process's virtual memory. As virtual memory helps to avoid one process to touch other process's virtual memory. Then how can memory mapping is used for Interprocess Communication(IPC)?
In OS like Linux, whether has each individual process separate section of heap, stack and memory mapping or all processes have one common section for heap, stack and MMAP?
Example :
if there are P1,P2 and P3 processes are running on linux OS. will all have common table as given in picture or each individual task have separate table to each section.
In 32 bit system, 2^32=4 gigabytes of virtual memory is possible and 1G byte is reserved for kernel and 3 gigabytes for userspace applications. can each individual process have up to 3 gigabytes of virtual memory or sum of all userspace applications size could be 3 gigabytes (i.e virtual memory size of (P1+P2+P3)<=3 gigabytes)?
--
Learner
Using memory mapping for IPC works by mapping the same range of physical memory into two or more virtual address ranges in different processes. This works for communication because both processes are using the exact same memory cells (although they might "see" them differently, at different addresses). You change a value in one mapping, and it is instantly visible in the other mapping in a different process because it is the very same memory.
Every process has its own independent stack and heap. The OS does not care about that at all, it only cares about pages. The heap and the stack are things that are implemented by the application (via the runtime). When you call a function like malloc, the allocator in the runtime either returns a block that it already had reserved earlier or one that it has recylced (you called free earlier), or it asks the OS to reserve some more memory (sbrk or mmap). When you first access this memory, the OS sees a page fault and verifies that you are allowed to access this location (because you've reserved it) and then provides a valid page.
Every process can use (as in "reserve") the whole available address space (3GiB in your example). This does not interfere with any other process. Note that due to fragmentation and alignment, and because your executable and the stack take away a little bit, you will in practice not be able to allocate the full 3 GiB, but you can get close to it.
All processes together can use as much virtual memory as is available on the system (physical RAM plus swap space), but they can only use as much as there is physical memory available at the same time (minus a little bit for this and that, like unpageable kernel memory and such).

How does the amount of memory for a process get determined?

From my understanding, when a process is under execution it has some amount of memory at it's disposal. As the stack increases in size it builds from one end of the process (disregarding global variables that come before the stack), while the heap builds from another end. If you keep adding to the stack or heap, eventually all the memory will be used up for this process.
How does the amount of memory the process is given get determined? I can only imagine it depends on a bunch of different variables, but an as-general-as-possible response would be great. If things have to get specific, I'm interested in linux processes written in C++.
On most platforms you will encounter, Linux runs with virtual memory enabled. This means that each process has its own virtual address space, the size of which is determined only by the hardware and the way the kernel has configured it.
For example, on the x86 architecture with a "3/1" split configuration, every userspace process has 3GB of address space available to it, within which the heap and stack are allocated. This is regardless of how much physical memory is available in the system. On the x86-64 architecture, 128TB of address space is typically available to each userspace process.
Physical memory is separately allocated to back that virtual memory. The amount of this available to a process depends upon the configuration of the system, but in general it's simply supplied "on-demand" - limited mostly how much physical memory and swap file space exists, and how much is currently in use for other purposes.
The stack does not magically grow. It's size is static and the size is determined at linking time. So when you take enough space from the stack, it overflows (stack overflow ;)
On the other hand, the heap area 'magically' grows. Meaning that when ever more memory is needed for heap, the program asks operating system for more memory.
EDIT: As Mat pointed out below, the stack actually can increase during runtime on modern operating systems.

Why is the ELF execution entry point virtual address of the form 0x80xxxxx and not zero 0x0?

When executed, program will start running from virtual address 0x80482c0. This address doesn't point to our main() procedure, but to a procedure named _start which is created by the linker.
My Google research so far just led me to some (vague) historical speculations like this:
There is folklore that 0x08048000 once was STACK_TOP (that is, the stack grew downwards from near 0x08048000 towards 0) on a port of *NIX to i386 that was promulgated by a group from Santa Cruz, California. This was when 128MB of RAM was expensive, and 4GB of RAM was unthinkable.
Can anyone confirm/deny this?
As Mads pointed out, in order to catch most accesses through null pointers, Unix-like systems tend to make the page at address zero "unmapped". Thus, accesses immediately trigger a CPU exception, in other words a segfault. This is quite better than letting the application go rogue. The exception vector table, however, can be at any address, at least on x86 processors (there is a special register for that, loaded with the lidt opcode).
The starting point address is part of a set of conventions which describe how memory is laid out. The linker, when it produces an executable binary, must know these conventions, so they are not likely to change. Basically, for Linux, the memory layout conventions are inherited from the very first versions of Linux, in the early 90's. A process must have access to several areas:
The code must be in a range which includes the starting point.
There must be a stack.
There must be a heap, with a limit which is increased with the brk() and sbrk() system calls.
There must be some room for mmap() system calls, including shared library loading.
Nowadays, the heap, where malloc() goes, is backed by mmap() calls which obtain chunks of memory at whatever address the kernel sees fit. But in older times, Linux was like previous Unix-like systems, and its heap required a big area in one uninterrupted chunk, which could grow towards increasing addresses. So whatever was the convention, it had to stuff code and stack towards low addresses, and give every chunk of the address space after a given point to the heap.
But there is also the stack, which is usually quite small but could grow quite dramatically in some occasions. The stack grows down, and when the stack is full, we really want the process to predictably crash rather than overwriting some data. So there had to be a wide area for the stack, with, at the low end of that area, an unmapped page. And lo! There is an unmapped page at address zero, to catch null pointer dereferences. Hence it was defined that the stack would get the first 128 MB of address space, except for the first page. This means that the code had to go after those 128 MB, at an address similar to 0x080xxxxx.
As Michael points out, "losing" 128 MB of address space was no big deal because the address space was very large with regards to what could be actually used. At that time, the Linux kernel was limiting the address space for a single process to 1 GB, over a maximum of 4 GB allowed by the hardware, and that was not considered to be a big issue.
Why not start at address 0x0? There's at least two reasons for this:
Because address zero is famously known as a NULL pointer, and used by programming languages to sane check pointers. You can't use an address value for that, if you're going to execute code there.
The actual contents at address 0 is often (but not always) the exception vector table, and is hence not accessible in non-privileged modes. Consult the documentation of your specific architecture.
As for the entrypoint _start vs main:
If you link against the C runtime (the C standard libraries), the library wraps the function named main, so it can initialize the environment before main is called. On Linux, these are the argc and argv parameters to the application, the env variables, and probably some synchronization primitives and locks. It also makes sure that returning from main passes on the status code, and calls the _exit function, which terminates the process.

Resources