Does using the program stack involves syscalls? - linux

I'm studying operating system theory, and I know that heap allocation involves a specific syscall and I know that compilers usually optimize for this requesting more than needed beforehand.
But I don't find information about stack allocation. What about it? It involves a specific syscall every time you read from it or write to it (for example when you call a function with some parameters)? Or there is some other mechanism that don't involve syscall perhaps?

Typically when the OS starts your program it examines the executable file's headers and arranges various areas for various things (an area for your executable's code, and area for your executable's data, etc). This includes setting up an initial stack (and a lot more - e.g. finding shared libraries and doing dynamic linking).
After the OS has done all this, your executable starts executing. At this point you already have memory for a stack and can just use it without any system calls.
Note 1: If you create threads, then there will probably be a system call involved to create the thread and that system call will probably allocate memory for the new thread's stack.
Note 2: Typically there's "virtual memory" (what your program sees) and "physical memory" (what the hardware sees); and in between typically the OS does lots of tricks to improve performance and avoid wasting physical memory, and to hide resource limits (so you don't have to worry so much about running out of physical memory). One of these tricks is to allocate virtual memory (e.g. for a large stack) without allocating any actual physical memory, and then allocate the physical memory if/when the virtual memory is first modified. Other tricks include various "swap space" schemes, and memory mapped files. These tricks rely on requests generated by the CPU on your program's behalf (e.g. page fault exceptions) which aren't system calls, but have similar ("ask kernel to do something") characteristics.
Note 3: All of the above depends on which OS. Different operating systems do things differently. I've chosen words carefully - e.g. "Typically" means that most modern operating systems work like I've described (but "typically" does not imply that all possible operating systems work like that; and some operating systems do not work like I've described).

No, stack is normal memory. For process point of view, there is no difference (and so the nasty security bug, where you return a pointer to a data in stack, but stack now is changed.
As Brendan wrote, OS will setup stack for the process at program loading. But if you access a non-allocated page of stack (e.g. if your stack if growing), kernel may allocate automatically for you a new stack page. (not much different as when you try to allocate new memory in heap, and there is no more memory available on program space: but in this case you explicitly do a syscall to tell kernel you want more heap memory).
You will notice that usually stack go in one direction and heap (allocated memory) in the other direction (usually toward each others). So if you program need more stack you have space, but if you program do not need much stack, you can use memory for e.g. huge array. Or the contrary: if you do a lot of recursion, you allocate much stack (but you probably need less heap memory).
Two additional consideration: CPU may have special stack instruction. But you can see them as syntactic sugar (you can simulate PUSH and POP with MOV. CALL and RET with JMP (and simulated PUSH and POP).
And kernel may use a special stack for his own purposes (especially important for interrupts).

Related

How are stack and heap segment managed in x86 without utilizing the segmentation mechanism?

From Understanding the Linux Kernel:
Segmentation has been included in 80x86 microprocessors to encourage programmers to split their applications into logically related entities, such as subroutines or global and local data areas. However, Linux uses segmentation in a very limited way. In fact, segmentation and paging are somewhat redundant, because both can be used to separate the physical address spaces of processes: segmentation can assign a different linear address space to each process, while paging can map the same linear address space into different physical address spaces. Linux prefers paging to segmentation for the following reasons:
Memory management is simpler when all processes use the same segment register values—that is, when they share the same set of linear addresses.
One of the design objectives of Linux is portability to a wide range of architectures; RISC architectures, in particular, have limited support for segmentation.
The 2.6 version of Linux uses segmentation only when required by the 80x86 architecture.
The x86-64 architecture does not use segmentation in long mode (64-bit mode). As the x86 has segments, it is not possible to not use them. Four of the segment registers: CS, SS, DS, and ES are forced to 0, and the limit to 2^64. If so, two questions have been raised:
Stack data (stack segment) and heap data (data segment) are mixed together, then pop from the stack and increase the ESP register is not available.
How does the operating system know which type of data is (stack or heap) in a specific virtual memory address?
How do different programs share the kernel code by sharing memory?
Stack data (stack segment) and heap data (data segment) are mixed together, then pop from the stack and increase the ESP register is not available.
As Peter states in the above comment, even though CS, SS, ES and DS are all treated as having zero base, this does not change the behavior of PUSH/POP in any way. It is no different than any other segment descriptor usage really. You could get overlapping segments even in 32-bit multi-segment mode if you point multiple selectors to the same descriptor. The only thing that "changes" in 64-bit mode is that you have a base forced by the CPU, and RSP can be used to point anywhere in addressable memory. PUSH/POP operations will work as usual.
How does the operating system know which type of data is (stack or heap) in a specific virtual memory address?
User-space programs can (and will) move the stack and heap around as they please. The operating system doesn't really need to know where stack and heap are, but it can keep track of those to some extent, assuming the user-space application does everything according to convention, that is uses the stack allocated by the kernel at program startup and the program break as heap.
Using the stack allocated by the kernel at program startup, or a memory area obtained through mmap(2) with MAP_GROWSDOWN, the kernel tries to help by automatically growing the memory area when its size is exceeded (i.e. stack overflow), but this has its limits. Manual MAP_GROWSDOWN mappings are rarely used in practice (see 1, 2, 3, 4). POSIX threads and other more modern implementations use fixed-size mappings for threads.
"Heap" is a pretty abstract concept in modern user-space applications. Linux provides user-space applications with the basic ability to manipulate the program break through brk(2) and sbrk(2), but this is rarely in a 1-to-1 correspondence with what we got used to call "heap" nowadays. So in general the kernel does not know where the heap of an application resides.
How do different programs share the kernel code by sharing memory?
This is simply done through paging. You could say there is one hierarchy of page tables for the kernel and many others for user-space processes (one for each task). When switching to kernel-space (e.g. through a syscall) the kernel changes the value of the CR3 register to make it point to the kernel's page global directory. When switching back to user-space, CR3 is changed back to point to the current process' page global directory before giving control to user-space code.

Memory available to assembly program in Linux

For fun I am just trying to write a program in assembly for Linux on a laptop with an x86 processor to get some system information. So one of the things I am trying to find is how much memory is available to my program, and where e.g. the stack is and if and how I can allocate additional memory if needed.
Long time ago I did things like this on an Atari ST and there was just a system 'malloc' I could ask memory from and there were functions to find the available memory.
I know Linux is set up differently and I kind of have the whole address space to myself, but I guess there are some memory areas I am not allowed to touch.
And somehow a default stack seems to have been setup.
I researched quite a bit for this, but I can't find any 'assembly' system call. Most people point to linking the C malloc for memory management, but I am not looking for a memory manager. I just want to know the memory boundaries of my program.
I find things like getrlimit, setrlimit, prlimit and brk and sbrk, but those seem to be C functions and not system calls.
What am i missing?
Linux uses virtual memory (and ASLR). Atari ST doesn't use either so it did have a fixed memory map for some OS data structures and code. (Because the OS was in ROM and couldn't be easily updated, some people even documented some internal addresses.)
Linux tries to keep the boundary between kernel and user-space rigid, with a well-defined documented API / ABI for user-space to interact with the kernel via system calls. (e.g. on x86-64, via the syscall instruction). User-space doesn't need to care what's on the other side of that wall, and usually not even where its pages are in virtual memory as long as it has pointers to them.
When glibc malloc wants more pages from the OS, it uses mmap(MAP_ANONYMOUS) or brk to get them, and hand out chunks of them for small calls to malloc. It keeps bookkeeping data structures in user-space (so that's per-process of course).
I know Linux is set up differently and I kind of have the whole address space to myself, but I guess there are some memory areas I am not allowed to touch.
Yeah, every process has its own virtual address space. You can only touch the parts you've allocated, otherwise the resulting page fault will be "invalid" (OS knows there isn't supposed to be a physical page for that virtual page) and will deliver a SIGSEGV signal to your process if you try to read or write it. ("valid" page faults happen because of swap space or lazy allocation / copy-on-write; the kernel updates the HW page tables and returns to user-space for it to re-run the instruction that faulted.)
Also, the kernel claims the high half of virtual address space for its own use. (https://wiki.osdev.org/Higher_Half_Kernel). See also https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt for Linux's x86-64 memory map layout.
I can't find any 'assembly' system call.
mmap and brk are true system calls. See the "notes" section of the brk(2) man page. Section 2 man pages are system calls, section 3 are libc functions.
Of course in C when you call mmap(...), you're actually calling a wrapper function in glibc. glibc provides wrapper functions, not inline asm macros that use the syscall instruction directly.
See also The Definitive Guide to Linux System Calls which explains the asm interface, and also the VDSO pages. Linux maps some kernel memory (read-only) into your user-space process, holding code and data so getpid() and clock_gettime() can run in user-space.
Also various Q&As on Stack Overflow, including What are the calling conventions for UNIX & Linux system calls on i386 and x86-64
So one of the things I am trying to find is how much memory is available to my program
There isn't a system call to query the current memory map of your process. Parsing /proc/self/maps would be your best bet.
See Finding mapped memory from inside a process for some fun ideas on using system calls to scan ranges of virtual address space for mapped pages. e.g. Like Linux's mincore(2) syscall returns -ENOMEM if the specified range contains any unmapped pages.

Linux multiple page bounds and cpu segments

I am puzzled as to how Linux is able to have so many segments and it can still have bounds checking. To my knowledge, modern CPUs have a couple of segment data registers (code, data, etc).
But Linux has multiple segments of its own: Stack, BSS, heap, code, globals, and many more (especially if the heap is large and composed of many segments). Not every CPU has enough registers to track all these segments.
If I am not mistaken, Linux stores each segment in a separate page, so how is it able to prevent one of these pages from reading or writing out of bounds?
My only possible explanations are that Linux:
performs some manual checking on every write
places all the pages close together in a way such that they can be tracked with a few registers
With the advent of 64-bit Intel, the concept of hardware segments has died the death that should have taken place in the 1970's.
But Linux has multiple segments of its own: Stack, BSS, heap, code, globals, and many more (especially if the heap is large and composed of many segments).
These are pedagogical concepts that have little relationship to reality outside the implementation of linkers--but which bad books on operating systems persist in using.
A stack is just memory. Heap is just memory. The operating system has no knowledge if memory is being used for a stack whether it is being used for a heap. The operating system simply allocates memory to a process with different attributes (e.g., read/write, read only, read/execute). What the process does with that memory is its own business.

What is "stack hog"

What does "stack hog" means when talking about Linux kernel?
I read this notion on some linux kernel books(Professional Linux Kernel Architecture by Wolfgang Mauerer), but what exactly does "stack hog" means? Thanks.
"Stack hog" is an informal name used to describe functions that use significant amounts of automatic storage (AKA "the stack"). What exactly counts as "hogging" varies by the execution environment: in general, kernel-level functions have tighter limits on the stack space - just a few kilobytes, so a function considered a "stack hog" in kernel mode may become a "good citizen" in user mode.
A common reason for a function to become a stack hog is allocating buffers or other arrays in the automatic memory. This is more convenient, because you do not need to remember to free the memory and check the results of the allocation. You could also save some CPU cycles on the allocation itself. The downside is a possibility of overflowing the stack, wich results in panic for kernel-level programs. That is why a common remedy of "stack hogging" is moving some of your buffers into dynamic memory.
The Linux kernel uses 4K stacks. Using an inordinate amount of that small space is considered being a hog. If you are "lazy" and allocate a buffer on the stack or have a function with a large number of parameters that is being a hog.
The stack must hold any sequence of calls needed to service a system call as well as any interrupt handlers that may be called. So it is very important to conserve stack space.

How does the amount of memory for a process get determined?

From my understanding, when a process is under execution it has some amount of memory at it's disposal. As the stack increases in size it builds from one end of the process (disregarding global variables that come before the stack), while the heap builds from another end. If you keep adding to the stack or heap, eventually all the memory will be used up for this process.
How does the amount of memory the process is given get determined? I can only imagine it depends on a bunch of different variables, but an as-general-as-possible response would be great. If things have to get specific, I'm interested in linux processes written in C++.
On most platforms you will encounter, Linux runs with virtual memory enabled. This means that each process has its own virtual address space, the size of which is determined only by the hardware and the way the kernel has configured it.
For example, on the x86 architecture with a "3/1" split configuration, every userspace process has 3GB of address space available to it, within which the heap and stack are allocated. This is regardless of how much physical memory is available in the system. On the x86-64 architecture, 128TB of address space is typically available to each userspace process.
Physical memory is separately allocated to back that virtual memory. The amount of this available to a process depends upon the configuration of the system, but in general it's simply supplied "on-demand" - limited mostly how much physical memory and swap file space exists, and how much is currently in use for other purposes.
The stack does not magically grow. It's size is static and the size is determined at linking time. So when you take enough space from the stack, it overflows (stack overflow ;)
On the other hand, the heap area 'magically' grows. Meaning that when ever more memory is needed for heap, the program asks operating system for more memory.
EDIT: As Mat pointed out below, the stack actually can increase during runtime on modern operating systems.

Resources