Microprocessor context switch - multithreading

I've started investigating FreeRTOS and looked into the task context save routine. This routines stores the registers and the stack pointer. My question is about the stack in different threads. What if there is a thread which performs pushes and pops generated by the compiler. Wouldn't it be possible to overwrite the stack of a different thread?

Each thread must be allocated sufficient stack for its own call-stack plus that required for context storage. The amount of additional stack space required for context storage will depend on the target, but in teh case of FreeRTOS specifically, the constant configMINIMAL_STACK_SIZE will be at least that size plus some margin.
On some targets where the thread stack is used in interrupt contexts, you will also need to account for stack usage by interrupts. If interrupts are nestable; the worst case condition will be when all interrupts become active in priority order before any have completed - perhaps an unlikely scenario, but one you should consider.
Advice on stack allocation for FreeRTOS is provided in the FAQ at http://www.freertos.org/FAQMem.html#StackSize

Related

How does "the Stack" play into the execution of a thread?

I am working on Pintos.
Which is sort of like an educational tool for learning about building operating systems, and am on the second project which is geared around building support for user programs.
So, first order of business is to Set up The Stack! Great.
Problem is - since the beginning of the class I've been shuddering at those words The Stack - because I can never quite get a grasp around what The Stack is and how it plays into the execution of a program or thread. So I understand it is an area of memory set up in RAM, but that's about it.
My questions are as follows:
What is the function of the stack?
How does "The Stack" play into the execution of a thread in the CPU, with respect to the Program Counter, Registers, and Stack Pointer?
How are things added to the stack and how are they removed from it?
Furthermore, even if you don't know about Pintos, what does it mean to "set up the stack" when building support for user programs in an operating system?
A stack is just memory. The only thing that makes memory a stack is that the process accesses it Last In First Out.
What is the function of the stack?
The function of a stack in a computer is to support function calls. Function calls mirror the operation of a stack. Calling a function pushes it. Exiting a function pops.
How does "The Stack" play into the execution of a thread in the CPU, with respect to the Program Counter, Registers, and Stack Pointer?
From the CPU's perspective a thread is a process. Operating systems trick the CPU by having multiple processes share the same address space. Thus the process becomes a thread.
The program counter and stack pointer are registers. On most processors there are instructions that manipulate the stack pointer register. For example, a function call instruction will push the program counter on to the stack by decrementing the stack pointer and storing the program counter at the new location the referenced by the stack pointer.
How are things added to the stack and how are they removed from it?
Stack memory is allocated by decrementing the stack pointer. Something like:
SUB #32, SP
will allocate 32 bytes on the stack and
ADD #32, SP
will free that memory. The advantage of the stack is that it is very fast for allocating memory.
In addition, as mentioned above, some instructions are likely to manipulate the stack.
Furthermore, even if you don't know about Pintos, what does it mean to "set up the stack" when building support for user programs in an operating system?
To set up a stack you have to:
Allocate memory for the stack.
You might also want to allocate guard memory that is protected on either side of the stack to detect overflows and underflows.
You assign move the address of the top of the stack into the state pointer register.
As I said before, a stack is just memory. A program can easily allocate its own memory and move its address into the stack pointer to create a new stack.

Stacks management by operating system

I am an absolute beginner in operating systems. So please, do not mind if the question appears too naive or basic.
From what I've read, each process has its own Kernel stack and User stack. So does each thread. Threads of a process share the same address space. They also share the code and data segment, but not the stack.
But how is this possible? There is only one stack pointer in a CPU, so how can each thread have its own stack?
And what is the difference b/w stack and stack frame? From what I've read, there is only one stack and frames are pushed on it. Again, it is a physical stack? Do these stack exist in the virtual memory? Can someone please clear my concepts? I am confused and cannot move forward.
From what I've read, each process has its own Kernel stack and User stack. So does each thread.
Each thread has its own kernel and user stack. Processes may contain any number of stacks -- at least one for each of their threads, possibly more.
Threads of a process share the same address space. They also share the code and data segment, but not the stack. But how is this possible?
Because the term "share" is being used in two different ways.
My wife and I both jointly own two cars, so in that sense, we share two cars. But I have one car that only I use and she has one car that only she uses. In that sense, we each have our own car.
Similarly, a process with two threads has two stacks that are shared. One is for each thread. So each thread has its own stack, though they can access each other's stacks if they wish to.
There is only one stack pointer in a CPU, so how can each thread have its own stack?
A stack can be sitting on disk. A stack can be sitting in memory but not being used as a stack.
And what is the difference b/w stack and stack frame? From what I've read, there is only one stack and frames are pushed on it.
Right, so a single stack could have several frames pushed onto it. When one function finishes, it pops of its stack frame and returns to the caller with the caller's frame on the top of the stack.
Again, it is a physical stack?
I don't know what that means.
Do these stack exist in the virtual memory?
Yes. That's why one thread can easily access variables on another thread's stack if the address is passed from one to the other. A stack is just some memory that's being used as a stack.

How can there be multiple call stacks allocated at the same time? How does the stack pointer change between threads?

Summary of my understanding:
The top memory addresses are used for the? (I initially thought there was only one call stack) stack, and the? stack grows downwards (What and where are the stack and heap?)
However, each thread gets it's own stack allocated, so there should be multiple call stacks in memory (https://stackoverflow.com/a/80113/2415178)
Applications can share threads (e.g, the key application is using the main thread), but several threads can be running at the same time.
There is a CPU register called sp that tracks the stack pointer, the current stack frame of a call stack.
So here's my confusion:
Do all of the call stacks necessary for an application (if this is even possible to know) get allocated when the application gets launched? Or do call stacks get allocated/de-allocated dynamically as applications spin off new threads? And if that is the case, (I know stacks have a fixed size), do the new stacks just get allocated right below the previous stacks-- So you would end up with a stack of stacks in the top addresses of memory? Or am I just fundamentally misunderstanding how call stacks are being created/used?
I am an OS X application developer, so my visual reference for how call stacks are created come from Xcode's stack debugger:
Now I realize that how things are here are more than likely unique to OS X, but I was hoping that conventions would be similar across operating systems.
It appears that each application can execute code on multiple threads, and even spin off new worker threads that belong to the application-- and every thread needs a call stack to keep track of the stack frames.
Which leads me to my last question:
How does the sp register work if there are multiple call stacks? Is it only used for the main call stack? (Presumably the top-most call stack in memory, and associated with the main thread of the OS) [https://stackoverflow.com/a/1213360/2415178]
Do all of the call stacks necessary for an application (if this is even possible to know) get allocated when the application gets launched?
No. Typically, each thread's stack is allocated when that thread is created.
Or do call stacks get allocated/de-allocated dynamically as applications spin off new threads?
Yes.
And if that is the case, (I know stacks have a fixed size), do the new stacks just get allocated right below the previous stacks-- So you would end up with a stack of stacks in the top addresses of memory? Or am I just fundamentally misunderstanding how call stacks are being created/used?
It varies. But the stack just has to be at the top of a large enough chunk of available address space in the memory map for that particular process. It doesn't have to be at the very top. If you need 1MB for the stack, and you have 1MB, you can just reserve that 1MB and have the stack start at the top of it.
How does the sp register work if there are multiple call stacks? Is it only used for the main call stack?
A CPU has as many register sets as threads that can run at a time. When the running thread is switched, the leaving thread's stack pointer is saved and the new thread's stack pointer is restored -- just like all other registers.
There is no "main thread of the OS". There are some kernel threads that do only kernel tasks, but also user-space threads also run in kernel space to run the OS code. Pure kernel threads have their own stacks somewhere in kernel memory. But just like normal threads, it doesn't have to be at the very top, the stack pointer just has to start at the highest address in the chunk used for that stack.
There is no such thing as the "main thread of the OS". Every process has its own set of threads, and those threads are specific to that process, not shared. Typically, at any given point in time, most threads on a system will be suspended awaiting input.
Every thread in a process has its own stack, which is allocated when the thread is created. Most operating systems will leave some space between each stack to allow them to grow if needed, and to prevent them from colliding with each other.
Every thread also has its own set of CPU registers, including a stack pointer (pointing to a location in that thread's stack).

Functions and variable space with threading using clone

I currently intend to implement threading using clone() and a question is, if I have all threads using the same memory space, with each function I call in a given thread, will each thread using a different part of memory when the same function is called, or do I do have todo something to ensure this happens?
Each thread will be using the same memory map overall but a different, separate thread-local stack for function calls. When different threads have called the same function (which itself lives at the same executable memory location), any local variables will not be shared, because they are allocated on the stack upon entry into the function / as needed, and each thread has its own stack by default.
References to any static/global memory (i.e., anything not allocated on the thread-local stack, such as globals or references to the heap or mmap memory regions passed to / visible to a thread after calling clone, and in general, the full memory map and process context itself) will, of course, be shared and subject to the usual host of multithreading issues (i.e., synchronization of shared state).
Note that you have to setup this thread-local stack space yourself before calling clone. From the manpage:
The child_stack argument specifies the location of the stack used by
the child process. Since the child and calling process may share
memory, it is not possible for the child process to execute in the
same stack as the calling process. The calling process must therefore
set up memory space for the child stack and pass a pointer to this
space to clone().
Stacks grow downward on all processors that run Linux (except the HP
PA processors), so child_stack usually points to the topmost address
of the memory space set up for the child stack.
The child_stack parameter is the second argument to clone. It's the responsibility of the caller (i.e., the parent thread) to ensure that each child thread created via clone receives a separate and non-overlapping chunk of memory for its stack.
Note that allocating and setting-up this thread-local stack memory region is not at all simple. Ensure that your allocations are page-aligned (start address is on a 4K boundary), a multiple of the page size (4K), amply-sized (if you only have a few threads, 2MB is safe), and ideally contains a "guard" section following the usable space. The stack guard is some number of pages with no access privileges-- no reading, no writing-- following the main stack region to guard the rest of the virtual memory address space should a thread dynamically exceed its stack size (e.g., with a bunch of recursion or functions with very large temporary buffers as local variables) and try to continue to grow into the stack guard region, which will fail-early as the thread will be served a SIGSEGV right away rather than insidiously corrupting. The stack guard is technically optional. You should probably be using mmap to allocate your stacks, although posix_memalign would do as well.
All that said, I've got to ask: why try to implement threading with clone to start? There are some very challenging problems here, and the POSIX threading library has solved them (in a portable way as well). If it's the fine-grained control of clone you want, then checkout the pthread_attr_* functions; they pretty much cover every non-obscure use case (such as allowing you to allocate your own stack if you like-- from the previous discussion, I'd think you wouldn't). The very performant, general Linux implementation, amongst other things, fully wraps clone and a large variety of other heinous system calls relevant to threading-- many of which do not even have C library wrappers and must be called via syscall. It all depends upon what you want to do.

Detect the bounds of the stack of the current thread

I'm writing a semi-accurate garbage collector, and I'm wondering if there is a way to accurately detect the boundaries of the system-allocated stack for a given thread.
I need this so I can scan the stack for roots (pointers), and my current approach is to save the stack pointer when entering a new thread and when entering main(), and then to save it again for each thread when starting a garbage collection, using a global lock.
(I know that this approach is not ideal in the long run, since it causes unnecessary locking, but for now it is acceptable until the basics are up)
But I would really like to have a more "secure" way of detecting the boundaries, because it is quite easy for roots to 'escape' this mechanism (depending on the thread implementation -- pthreads is easy to manage, but not so with OpenMP or Grand Central Dispatch).
It would be even more awesome if there was a portable way to do this, but I don't expect that to be the case. I'm currently working on Mac OS X 10.6, but answers for any platform are welcome.
You could use the VM mechanism to write-protect the end of the stack, and expand on a VM write trap. Then you'd know the boundary of the stack within a stack page.
But I'm not sure I understand the objection to simply inspecting the current value of the SP for the existing thread, * if you make the "big stack" assumption * typically assumed by the small-number-of-threads parallelism community.
See this SO article for a discussion about a world model where you don't have "big stacks". Then you can't build a GC that simply scans "the stack", because it is heap allocated on demand (e.g., function entry).
Now you need to scan all the stack segments that might be live.

Resources