Is stack memory contiguous physically in Linux?

Is stack memory contiguous physically in Linux? - linux

As far as I can see, stack memory is contiguous in virtual memory address, but stack memory is also contiguous physically? And does this have something to do with the stack size limit?
Edit:
I used to believe that stack memory doesn't has to be contiguous physically, but why do we think that stack memory is always quicker than heap memory? If it's not physically contiguous, how can stack take more advantage of cache? And there is another thing that always confuse me, cpu executes directives in data segment, which is not near the stack segment in virtual memory, I don't think the operating system will make stack segment and data segment close to each other physically, so this might do harm to the cache effect, what do you think?
Edit again:
Maybe I should give an example to express myself better, if we want to sort a large amount of numbers, using array to store the numbers is better than using a list, because every list node may be constructed by malloc, so it may not take good advantage of cache, that's why I say stack memory is quicker than heap memory.

As far as I can see, stack memory is contiguous in virtual memory
address, but stack memory is also contiguous physically? And does this
have something to do with the stack size limit?
No, stack memory is not necessarily contiguous in the physical address space. It's not related to the stack size limit. It's related to how the OS manages memory. The OS only allocates a physical page when the corresponding virtual page is accessed for the first time (or for the first time since it got paged out to the disk). This is called demand-paging, and it helps conserve memory usage.
why do we think that stack memory is always quicker
than heap memory? If it's not physically contiguous, how can stack
take more advantage of cache?
It has nothing to do with the cache. It's just faster to allocate and deallocate memory from the stack than the heap. That's because allocating and deallocating from the stack takes only a single instruction (incrementing or decrementing the stack pointer). On the other hand, there is a lot more work involved into allocating and/or deallocating memory from the heap. See this article for more information.
Now once memory allocated (from the heap or stack), the time it takes to access that allocated memory region does not depend on whether it's stack or heap memory. It depends on the memory access behavior and whether it's friendly to the cache and memory architecture.
if we want to sort a large amount of numbers, using array to store the
numbers is better than using a list, because every list node may be
constructed by malloc, so it may not take good advantage of cache,
that's why I say stack memory is quicker than heap memory.
Using an array is faster not because arrays are allocated from the stack. Arrays can be allocated from any memory (stack, heap, or anywhere). It's faster because arrays are usually accessed contiguously one element at a time. When the first element is accessed, a whole cache line that contains the element and other elements is fetched from memory to the L1 cache. So accessing the other elements in that cache line can be done very efficiently, but accessing the first element in the cache line is still slow (unless the cache line was prefetched). This is the key part: since cache lines are 64-byte aligned and both virtual and physical pages are 64-byte aligned as well, then it's guaranteed that any cache line fully resides within a single virtual page and a single physical page. This what makes fetching cache lines efficient. Again, all of this has nothing to do with whether the array was allocated from the stack or heap. It holds true either way.
On the other hand, since the elements of a linked list are typically not contiguous (not even in the virtual address space), then a cache line that contains an element may not contain any other elements. So fetching every single element can be more expensive.

Memory is memory. Stack memory is no faster than heap memory and is no slower. It is all the same. The only thing that makes a memory a stack or a heap is how it is allocated by the application. It is entirely possible to allocate a memory on the heap and make that the program stack.
The speed difference is in the allocation. Stack memory is allocated by subtracting from the stack pointer: one instruction.
The process of allocating heap depends upon the heap manager but it is much more complex and may requiring mapping pages to the address space.

No, there is no promise of contiguity of physical addresses. But it doesn't matter, because user-space programs don't use physical addresses, so have no idea that this is the case.

It is a complex topic.
Heap and stack have (usually) the same memory and memory type (MTRR, cache setting per page, etc.). [mmap, files, drivers could have different strategies, or when user explicit change it].
Stack could be faster, because it is often used. When you call a function, parameters and local variables are put into stack, so the cache is fresh. Additionally, because functions call and return often, probably there is some more stack in the other cache level, and seldom the top of stack is paged (because it was used recently).
So cache could be faster, but just if you have few variables. If you allow large arrays on stack e.g. with alloca, the advantage disappear.
In general, this is a very complex topic, and it is better not to optimize too much, because it could cause complex code, so more difficult to refactor and high level optimization of code. (e.g. on multi-dimentional arrays, the order of indices (and so memory) and loops could improve sensible the speed, but also quickly the code will be impossible to maintain).

Related

What happens to allocated pages that are mostly empty?

If a process initially has a number of pages allocated to it in the heap, but a lot of the data in the pages has been deallocated, is there some sort of optimization that the OS does to consolidate the data into one page so that the other pages can be freed?

In general, nothing happens, the heap will continue to have "holes" in it.
Since the (virtual) memory addresses known by a process must remain valid, the operating system cannot perform "heap compaction" on its own. However, some runtimes like .Net do it.
If you are using C or C++, all you can hope for by default is that malloc() will be able to reuse previously deallocated chunks. But if your usage pattern is "allocate a lot of small objects then deallocate half of them at random," the memory utilization will probably not decrease much from the peak.

If a process initially has a number of pages allocated to it in the heap
A process will not initially have pages allocates in a heap.
is there some sort of optimization that the OS does to consolidate the data into one page so that the other pages can be freed
The operating system has no knowledge of user heaps. It allocates pages to the process. What that process does with those pages is up to it (i.e., use them for a heap, stack, code, etc.).
A process's heap manager can consolidate freed chunks of memory. When this occurs, it is normally done to fight heap fragmentation. However, I have never seen a heap manager on a paging system that unmaps pages once they are mapped by the operating system.

The heap of a process never has holes on it. The heap is part of the data segment allocated to a process, that grows dynamically upwards to the top of the stack segment, basically with the use of the sbrk(2) system call (that fixes a new size to the data segment) so the heap is a continuous segment (at least in terms of virtual address space) of allocated pages. malloc(3) never returns the heap space (or part of it) to the system. See malloc(3) for info about this. While there are memory allocators that allow a process to have several heaps (by means of allocating new memory segments, by use of the mmap(2) system call) the segments allocated by a memory allocator are commonly never returned back to the system.
What happens is that the memory allocator reuses the heap space allocated with sbrk(2) and mmap(2) and manages memory for being reused, but it is never returned back to the system.
But don't fear, as this is handled in a good and profitable way by the system, anyway.
That should not affect the overall system management, except from the fact that it consumes virtual address space, and probably page contents will end in the swap device if you don't use them until the process references them again and makes the system to reload them from the swap device(s). If your process doesn't reuse the holes it creates in the heap, the most probable destination is for the system to move them to the swap device and continue reusing it for other processes.
At this moment, I don't know if the system optimices swap allocation by not swapping out zeroed pages, as it does, for example, with text segments of executables (they never go to a swap device, because their contents are already swapped off in the executable file ---this was the reason you couldn't erase in ancient unices a program executable, or the reason there's not need anymore to use the sticky bit in frequently used programs---) but I think it doesn't (and the reason is that it's most improbable the unused pages will be zeroed by the application)
Be warned only in the case you have a 15Gb single process' heap use in your system and 90% of heap use is not in use most of the time. But think better in optimising the allocation resources because a process that consumes 15Gb of heap while most of the time 90%+ is unused, seems to be a poor design. If you have no other chance, simply provide enough swap space to your system to afford that.

Increase stack size

I'm doing computations with huge arrays and for some of this computations I need an increased stack size! Is there any downside of setting the stack size to unlimited (ulimit -s unlimited) in my ~/.bashrc?
The program is written in fortran(F77 & F90) and parallelized with MPI. Some of my arrays have more than 2E7 entries and when I use a small number of cores with MPI it crashes with segmentation fault.
The array size stays the same through the whole computation therefore I setted them to fixes value:
real :: p(200,200,400)
integer :: ib,ie,jb,je,kb,ke
...
ib=1;ie=199
jb=2;je=198
kb=2;ke=398
call SOLVE_POI_EQ(rank,p(ib:ie,jb:je,kb:ke),R)

Setting the stacksize to unlimited likely won't help you. You are allocating a chunk of 64MB on the stack, and likely don't fill it from the top, but from the bottom.
This is important, because the OS grows the stack as you go. Whenever it detects a page-fault right below the stack segment, it will assume that you need more space, and silently insert a new page. The size of this trigger-region within your address-space is limited, though, and I doubt that its larger than 64 MB. Since you index variables are likely placed below your array on the stack, accessing them already does the 64 MB jump that kills your process.
Just make your array allocatable, add the corresponding allocate() statement, and you should be fine.

Stack size is never really unlimited, so you would still have some failures. And your code still won't be portable to Linux systems with smaller (or normal-sized) stacks.
BTW, you should explain which kind of programs are you running, show some source code.
If coding in C++, using standard containers should help a lot (regarding actual stack consumption). For example, a local (stack allocated) std::vector<int> v(10000); (instead of int v[10000];) has its data allocated on the heap (and deallocated by the destructor when you exit from the block defining it)
It would be much better to improve your programs to avoid excessive stack consumption. The need of a lot of stack space is really a bug that you should try to correct. A typical rule of thumb is to have call frames smaller than a few kilobytes (so allocate any larger data on the heap).
You might consider also using the Boehm conservative garbage collector: you would use GC_MALLOC instead of malloc (and you would heap allocate large data structure using GC_MALLOC) but you won't have to bother to free your (GC-heap allcoated) data.

Why has a (C-)stack a maximum of 2mb?

This question is about stack overflows, so where better to ask it than here.
If we consider how memory is used for a program (a.out) in unix, it is something like this:
| etext | stack, 2mb | heap ->>>
And I have wondered for a few years now why there is a restriction of 2MB for the stack. Consider that we have 64 bits for a memory address, then why not allocate like this:
| MIN_ADDR MAX_ADDR|
| heap ->>>> <<<- stack | etext |
MAX_ADDR will be somewhere near 2^64 and MIN_ADDR somewhere near 2^0, so there are many bytes in between which the program can use, but are not necessarily accounted for by the kernel (by actually assigning pages for them). The heap and stack will probably never reach each other, and hence the 2MB limit is not needed ( and would instead have a ~1.8446744e+19 bytes limit). If we are scared that they will reach each other, then set the limit to 2^63 or some bizarre and enormous number.
Furthermore, the heap grows from low to high, so our kernel can still resize blocks of memory (allocated with for example malloc) without necessarily needing to shift the content.
Moreover, a stack frame is always static in size in some way. So we never need to resize there, if we do, that would be awkward anyway, since we also need to change the whole pointer structure used by return and created by call.
I read this as an answer on another stackoverflow question:
"My intuition is the following. The stack is not as easy to manage as the heap. The stack need to be stored in continuous memory locations. This means that you cannot randomly allocate the stack as needed, but you need to at least reserve virtual addresses for that purpose. The larger the size of the reserved virtual address space, the fewer threads you can create."
Source: Why is the page size of Linux (x86) 4 KB, how is that calcualted
But we have loads of memory addresses! So this makes no sense. So why 2MB?
The reason I ask is that allocating memory on the stack is quite safe with respect to dangling pointers and memory leaks:
e.g. I prefer
int foo[5];
instead of
int *foo = malloc(5*sizeof(int));
Since it will deallocate by itself. Also, allocation on the stack is faster than allocation executed by malloc. However, If I allocate an image (i.e. a jpeg or png) on the stack, I am in a dangerous zone of overflowing the stack.
Another point on this matter, why not also allow this:
int *huge_list_of_data = malloc(1000*sizeof(char), 10 000 000 000*sizeof(char))
where we allocate a list object, which has initially the size of 1KB, but we ask the kernel to allocate it such that the page it is put on is not used for anything else, and that we want to have 10GB of pages behind it, which can be (partially) swapped in when necessary.
This way we don't need 10GB of memory, we only need 10GB of memory addresses.
So why no:
void *malloc( unsigned long, unsigned long );
?
In essence: WHY NOT USE THE PAGING SYSTEM OF UNIX TO SOLVE OUR MEMORY ALLOCATION PROBLEMS?
Thank you for reading.

How does the amount of memory for a process get determined?

From my understanding, when a process is under execution it has some amount of memory at it's disposal. As the stack increases in size it builds from one end of the process (disregarding global variables that come before the stack), while the heap builds from another end. If you keep adding to the stack or heap, eventually all the memory will be used up for this process.
How does the amount of memory the process is given get determined? I can only imagine it depends on a bunch of different variables, but an as-general-as-possible response would be great. If things have to get specific, I'm interested in linux processes written in C++.

On most platforms you will encounter, Linux runs with virtual memory enabled. This means that each process has its own virtual address space, the size of which is determined only by the hardware and the way the kernel has configured it.
For example, on the x86 architecture with a "3/1" split configuration, every userspace process has 3GB of address space available to it, within which the heap and stack are allocated. This is regardless of how much physical memory is available in the system. On the x86-64 architecture, 128TB of address space is typically available to each userspace process.
Physical memory is separately allocated to back that virtual memory. The amount of this available to a process depends upon the configuration of the system, but in general it's simply supplied "on-demand" - limited mostly how much physical memory and swap file space exists, and how much is currently in use for other purposes.

The stack does not magically grow. It's size is static and the size is determined at linking time. So when you take enough space from the stack, it overflows (stack overflow ;)
On the other hand, the heap area 'magically' grows. Meaning that when ever more memory is needed for heap, the program asks operating system for more memory.
EDIT: As Mat pointed out below, the stack actually can increase during runtime on modern operating systems.

What is a Memory Heap?

What is a memory heap ?

Presumably you mean heap from a memory allocation point of view, not from a data structure point of view (the term has multiple meanings).
A very simple explanation is that the heap is the portion of memory where dynamically allocated memory resides (i.e. memory allocated via malloc). Memory allocated from the heap will remain allocated until one of the following occurs:
The memory is free'd
The program terminates
If all references to allocated memory are lost (e.g. you don't store a pointer to it anymore), you have what is called a memory leak. This is where the memory has still been allocated, but you have no easy way of accessing it anymore. Leaked memory cannot be reclaimed for future memory allocations, but when the program ends the memory will be free'd up by the operating system.
Contrast this with stack memory which is where local variables (those defined within a method) live. Memory allocated on the stack generally only lives until the function returns (there are some exceptions to this, e.g. static local variables).
You can find more information about the heap in this article.

A memory heap is a location in memory where memory may be allocated at random access. Unlike the stack where memory is allocated and released in a very defined order, individual data elements allocated on the heap are typically released in ways which is asynchronous from one another. Any such data element is freed when the program explicitly releases the corresponding pointer, and this may result in a fragmented heap. In opposition only data at the top (or the bottom, depending on the way the stack works) may be released, resulting in data element being freed in the reverse order they were allocated.

Heap is just an area where memory is allocated or deallocated without any order. This happens when one creates an object using the new operator or something similar. This is opposed to stack where memory is deallocated on the first in last out basis.

It's a chunk of memory allocated from the operating system by the memory manager in use by a process. Calls to malloc() et alia then take memory from this heap instead of having to deal with the operating system directly.

You probably mean heap memory, not memory heap.
Heap memory is essentially a large pool of memory (typically per process) from which the running program can request chunks. This is typically called dynamic allocation.
It is different from the Stack, where "automatic variables" are allocated. So, for example, when you define in a C function a pointer variable, enough space to hold a memory address is allocated on the stack. However, you will often need to dynamically allocate space (With malloc) on the heap and then provide the address where this memory chunk starts to the pointer.

A memory heap is a common structure for holding dynamically allocated memory.
See Dynamic_memory_allocation on wikipedia.
There are other structures, like pools, stacks and piles.

Memory organization is divided into two parts: heap memory and stack memory.
Heap memory is the main working memory, lowest address is the starting address.
In stack memory, the flow of data is driven by bottom to up approach. Then the memory Arch is named as stack.

every running process has its own private fake virtual memory provided by the OS.
the OS can map this to physical memory at any point as long as it is available otherwise it will map to disk and swap as needed.
this virtual memory is logically divided into segments for organizing different kinds of data.
the code segment holds the executable instructions.
the data segment holds static data such as global or static variables.
the stack holds local data that is automatically managed by called and returning functions.
all of these segments are fixed size even the stack its just the portion used can grow or shrink and is reclaimed as functions returned.
the only segment that is not preallocated at app startup and fixed size is the heap.
the app can request from the OS at runtime new memory to be allocated and the OS will reserve a part of your apps virtual space and commit that to physical memory as needed.
the OS will return a pointer to that newly allocated heap memory and that pointer holds the base or starting address of the new block. that pointer sits on the stack and when that stack space is reclaimed your pointer will be no longer in scope and therefore you have no means of access to that block of memory. and if you dont tell the OS you are done with it so it can reclaim it that is just zombie memory sitting there with no means of access and if your app keeps requesting memory while never giving it back it will crash when the system runs out of memory. so it is important to free or at least pass the pointer to another pointer external to the scope it was defined in so you can maintain an interface to that memory allocated in heap space. i would suggest looking into virtual memory further and understanding segments.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string