Memory: Stack and Swap - linux

When there isn't enough RAM, dynamically allocated variables on the heap can take advantage of swap space on the disk (albeit causing performance degradations). My question is if the stack in memory can take advantage of the swap space as well.
For example, the following program places a large array on the stack. (Of course, usually we would dynamically allocate large variables on the heap.) If this program crashes when run, can I make it run successfully by adding swap space?
int main()
{
int myArray[1000000];
return 0;
}

Actually it's what swap does, swaps program data and stack space:
http://www.linuxjournal.com/article/10678
These are placed in anonymous pages, so named because they have no
named filesystem source. Once modified, an anonymous page must remain
in RAM for the duration of the program unless there is secondary
storage to write it to. The secondary storage used for these modified
anonymous pages is what we call swap space.
The classic recommendation on systems that do strict VM accounting
vary, but most of them hover around a “twice the amount of RAM”
figure. That number assumes your memory mostly will be filled with a
bunch of small interactive programs (where their stack space is
possibly their largest memory demand).
Say you're running a Web server with 500 threads, each with 8MB of
stack space. That stack space alone is going to require that you have
4GB of swap space configured for the memory accountant to be happy.

Related

Is stack memory contiguous physically in Linux?

As far as I can see, stack memory is contiguous in virtual memory address, but stack memory is also contiguous physically? And does this have something to do with the stack size limit?
Edit:
I used to believe that stack memory doesn't has to be contiguous physically, but why do we think that stack memory is always quicker than heap memory? If it's not physically contiguous, how can stack take more advantage of cache? And there is another thing that always confuse me, cpu executes directives in data segment, which is not near the stack segment in virtual memory, I don't think the operating system will make stack segment and data segment close to each other physically, so this might do harm to the cache effect, what do you think?
Edit again:
Maybe I should give an example to express myself better, if we want to sort a large amount of numbers, using array to store the numbers is better than using a list, because every list node may be constructed by malloc, so it may not take good advantage of cache, that's why I say stack memory is quicker than heap memory.
As far as I can see, stack memory is contiguous in virtual memory
address, but stack memory is also contiguous physically? And does this
have something to do with the stack size limit?
No, stack memory is not necessarily contiguous in the physical address space. It's not related to the stack size limit. It's related to how the OS manages memory. The OS only allocates a physical page when the corresponding virtual page is accessed for the first time (or for the first time since it got paged out to the disk). This is called demand-paging, and it helps conserve memory usage.
why do we think that stack memory is always quicker
than heap memory? If it's not physically contiguous, how can stack
take more advantage of cache?
It has nothing to do with the cache. It's just faster to allocate and deallocate memory from the stack than the heap. That's because allocating and deallocating from the stack takes only a single instruction (incrementing or decrementing the stack pointer). On the other hand, there is a lot more work involved into allocating and/or deallocating memory from the heap. See this article for more information.
Now once memory allocated (from the heap or stack), the time it takes to access that allocated memory region does not depend on whether it's stack or heap memory. It depends on the memory access behavior and whether it's friendly to the cache and memory architecture.
if we want to sort a large amount of numbers, using array to store the
numbers is better than using a list, because every list node may be
constructed by malloc, so it may not take good advantage of cache,
that's why I say stack memory is quicker than heap memory.
Using an array is faster not because arrays are allocated from the stack. Arrays can be allocated from any memory (stack, heap, or anywhere). It's faster because arrays are usually accessed contiguously one element at a time. When the first element is accessed, a whole cache line that contains the element and other elements is fetched from memory to the L1 cache. So accessing the other elements in that cache line can be done very efficiently, but accessing the first element in the cache line is still slow (unless the cache line was prefetched). This is the key part: since cache lines are 64-byte aligned and both virtual and physical pages are 64-byte aligned as well, then it's guaranteed that any cache line fully resides within a single virtual page and a single physical page. This what makes fetching cache lines efficient. Again, all of this has nothing to do with whether the array was allocated from the stack or heap. It holds true either way.
On the other hand, since the elements of a linked list are typically not contiguous (not even in the virtual address space), then a cache line that contains an element may not contain any other elements. So fetching every single element can be more expensive.
Memory is memory. Stack memory is no faster than heap memory and is no slower. It is all the same. The only thing that makes a memory a stack or a heap is how it is allocated by the application. It is entirely possible to allocate a memory on the heap and make that the program stack.
The speed difference is in the allocation. Stack memory is allocated by subtracting from the stack pointer: one instruction.
The process of allocating heap depends upon the heap manager but it is much more complex and may requiring mapping pages to the address space.
No, there is no promise of contiguity of physical addresses. But it doesn't matter, because user-space programs don't use physical addresses, so have no idea that this is the case.
It is a complex topic.
Heap and stack have (usually) the same memory and memory type (MTRR, cache setting per page, etc.). [mmap, files, drivers could have different strategies, or when user explicit change it].
Stack could be faster, because it is often used. When you call a function, parameters and local variables are put into stack, so the cache is fresh. Additionally, because functions call and return often, probably there is some more stack in the other cache level, and seldom the top of stack is paged (because it was used recently).
So cache could be faster, but just if you have few variables. If you allow large arrays on stack e.g. with alloca, the advantage disappear.
In general, this is a very complex topic, and it is better not to optimize too much, because it could cause complex code, so more difficult to refactor and high level optimization of code. (e.g. on multi-dimentional arrays, the order of indices (and so memory) and loops could improve sensible the speed, but also quickly the code will be impossible to maintain).

Virtual memory sections and memory mapping area

As process has virtual memory which is copied into RAM during run time. As given in the previous post.
Which part of process virtual memory layout does mmap() uses?
I have following doubles :
If memory mapping is inside unallocated memory and it is inside process's virtual memory. As virtual memory helps to avoid one process to touch other process's virtual memory. Then how can memory mapping is used for Interprocess Communication(IPC)?
In OS like Linux, whether has each individual process separate section of heap, stack and memory mapping or all processes have one common section for heap, stack and MMAP?
Example :
if there are P1,P2 and P3 processes are running on linux OS. will all have common table as given in picture or each individual task have separate table to each section.
In 32 bit system, 2^32=4 gigabytes of virtual memory is possible and 1G byte is reserved for kernel and 3 gigabytes for userspace applications. can each individual process have up to 3 gigabytes of virtual memory or sum of all userspace applications size could be 3 gigabytes (i.e virtual memory size of (P1+P2+P3)<=3 gigabytes)?
--
Learner
Using memory mapping for IPC works by mapping the same range of physical memory into two or more virtual address ranges in different processes. This works for communication because both processes are using the exact same memory cells (although they might "see" them differently, at different addresses). You change a value in one mapping, and it is instantly visible in the other mapping in a different process because it is the very same memory.
Every process has its own independent stack and heap. The OS does not care about that at all, it only cares about pages. The heap and the stack are things that are implemented by the application (via the runtime). When you call a function like malloc, the allocator in the runtime either returns a block that it already had reserved earlier or one that it has recylced (you called free earlier), or it asks the OS to reserve some more memory (sbrk or mmap). When you first access this memory, the OS sees a page fault and verifies that you are allowed to access this location (because you've reserved it) and then provides a valid page.
Every process can use (as in "reserve") the whole available address space (3GiB in your example). This does not interfere with any other process. Note that due to fragmentation and alignment, and because your executable and the stack take away a little bit, you will in practice not be able to allocate the full 3 GiB, but you can get close to it.
All processes together can use as much virtual memory as is available on the system (physical RAM plus swap space), but they can only use as much as there is physical memory available at the same time (minus a little bit for this and that, like unpageable kernel memory and such).

Where does virtual memory exist in linux?

As program is stored on flash/disk. For it execution, program is loaded into virtual memory and is mapped to RAM by virtual manager. During its execution process is in RAM. Then where does virtual memory exist (where it has all .text, .data, .stack, .heap)?
The virtual memory is a view of the RAM plus maybe some swap space provided by a virtual memory manager. Modern OSs have virtual memory managers and provide virtual memory to processes so that the executing program can behave as if it had a contiguous address space whose size is not limited by the actual RAM. The pages or blocks making up the virtual memory can be mapped anywhere in the RAM, so that contiguos virtual pages need to be stored in contiguos RAM areas. Or they can be swapped out to page space or swap space, waiting there until needed, whereupon they're read by the OS and mapped to some RAM page.
When you say
During its execution process is in RAM.
This is not entirely correct. Some or all memory pages that belong to the process may be swapped out, as explained.
One more word concerning the answers and comments that say that "virtual" means it doesn't exist. This makes no sense. On the contrary, according to Webster:
being such in essence or effect ...
Hence virtual memory is something (therefore, it exists!) that behaves as if it were memory.
Virtual memory is just like an illusion of RAM. It uses paging to acquire additional RAM that could be used by the processes in operating system.
Virtual memory means memory you can access with "normal" momory access methods, although it isn't clear where the data is actually stored.
It may be
actually in RAM
in a swap area
in another file (memory mapped file)
and access to it will be handled appropriately.
It is a layer of, well, virtualization so that you as a programmer don't have to worry about where the data is actually put.
The original purpose was mainly to be able to provide more memory to processes than we actually have and to extend it with means of swap space, but there are even more:
The OS is free to use the RAM for whatever it seems necessary, e. g. caching. Under some circumstances, it may be more effective to use RAM for cache than for holding parts of a program which hasn't been used for a long time.
Provide additional memory to a program when it requests it: if you call malloc(), the program's library may request the OS to provide a part of memory which can be attached seamlessly into the address space.
Avoid stack overflow: if the stack grows larger and larger, the respective memory section may be extended as well transparently so that the program won't have to worry about it.
A system can even do "overcommitment" of memory: if a process requests a large amount of memory, the OS may say "yes, ok", i. e. provide the memory to the program. That means in the first place "allow the program to access a certain address space area", but this address space is not immediately backed by memory. Only as soon as the program accesses this memory the mapping will be done, and if this cannot be fulfilled, the program is crashed by the Out of emory killer (at least, under Linux).
All this works by page-wise (1 page = 4 kiB) assignment of physical memory to a program, viewed via the program's address space, and this in the amount and frequency as it is needed.

Why has a (C-)stack a maximum of 2mb?

This question is about stack overflows, so where better to ask it than here.
If we consider how memory is used for a program (a.out) in unix, it is something like this:
| etext | stack, 2mb | heap ->>>
And I have wondered for a few years now why there is a restriction of 2MB for the stack. Consider that we have 64 bits for a memory address, then why not allocate like this:
| MIN_ADDR MAX_ADDR|
| heap ->>>> <<<- stack | etext |
MAX_ADDR will be somewhere near 2^64 and MIN_ADDR somewhere near 2^0, so there are many bytes in between which the program can use, but are not necessarily accounted for by the kernel (by actually assigning pages for them). The heap and stack will probably never reach each other, and hence the 2MB limit is not needed ( and would instead have a ~1.8446744e+19 bytes limit). If we are scared that they will reach each other, then set the limit to 2^63 or some bizarre and enormous number.
Furthermore, the heap grows from low to high, so our kernel can still resize blocks of memory (allocated with for example malloc) without necessarily needing to shift the content.
Moreover, a stack frame is always static in size in some way. So we never need to resize there, if we do, that would be awkward anyway, since we also need to change the whole pointer structure used by return and created by call.
I read this as an answer on another stackoverflow question:
"My intuition is the following. The stack is not as easy to manage as the heap. The stack need to be stored in continuous memory locations. This means that you cannot randomly allocate the stack as needed, but you need to at least reserve virtual addresses for that purpose. The larger the size of the reserved virtual address space, the fewer threads you can create."
Source: Why is the page size of Linux (x86) 4 KB, how is that calcualted
But we have loads of memory addresses! So this makes no sense. So why 2MB?
The reason I ask is that allocating memory on the stack is quite safe with respect to dangling pointers and memory leaks:
e.g. I prefer
int foo[5];
instead of
int *foo = malloc(5*sizeof(int));
Since it will deallocate by itself. Also, allocation on the stack is faster than allocation executed by malloc. However, If I allocate an image (i.e. a jpeg or png) on the stack, I am in a dangerous zone of overflowing the stack.
Another point on this matter, why not also allow this:
int *huge_list_of_data = malloc(1000*sizeof(char), 10 000 000 000*sizeof(char))
where we allocate a list object, which has initially the size of 1KB, but we ask the kernel to allocate it such that the page it is put on is not used for anything else, and that we want to have 10GB of pages behind it, which can be (partially) swapped in when necessary.
This way we don't need 10GB of memory, we only need 10GB of memory addresses.
So why no:
void *malloc( unsigned long, unsigned long );
?
In essence: WHY NOT USE THE PAGING SYSTEM OF UNIX TO SOLVE OUR MEMORY ALLOCATION PROBLEMS?
Thank you for reading.

How does the amount of memory for a process get determined?

From my understanding, when a process is under execution it has some amount of memory at it's disposal. As the stack increases in size it builds from one end of the process (disregarding global variables that come before the stack), while the heap builds from another end. If you keep adding to the stack or heap, eventually all the memory will be used up for this process.
How does the amount of memory the process is given get determined? I can only imagine it depends on a bunch of different variables, but an as-general-as-possible response would be great. If things have to get specific, I'm interested in linux processes written in C++.
On most platforms you will encounter, Linux runs with virtual memory enabled. This means that each process has its own virtual address space, the size of which is determined only by the hardware and the way the kernel has configured it.
For example, on the x86 architecture with a "3/1" split configuration, every userspace process has 3GB of address space available to it, within which the heap and stack are allocated. This is regardless of how much physical memory is available in the system. On the x86-64 architecture, 128TB of address space is typically available to each userspace process.
Physical memory is separately allocated to back that virtual memory. The amount of this available to a process depends upon the configuration of the system, but in general it's simply supplied "on-demand" - limited mostly how much physical memory and swap file space exists, and how much is currently in use for other purposes.
The stack does not magically grow. It's size is static and the size is determined at linking time. So when you take enough space from the stack, it overflows (stack overflow ;)
On the other hand, the heap area 'magically' grows. Meaning that when ever more memory is needed for heap, the program asks operating system for more memory.
EDIT: As Mat pointed out below, the stack actually can increase during runtime on modern operating systems.

Resources