where thread is implemented in memory?

where thread is implemented in memory? - linux

We know that thread has its own stack it's implemented within the process. But my question is that when thread is implemented in his own stack that time it is the same stack which used by process or any other function?
One more doubt that thread share it's global variable,file descriptor, signal handler etc. But how it's share all these parameters within same address where all the threads executed?
Brief explanation will be appreciated.

when thread is implemented in his own stack that time it is the same stack which used by process or any other?
Can't quite parse this but I get the gist I think.
In most cases, under Linux in a multithreaded application, all of the threads share the same address space. Each thread if it is running on a separate processor may have local cached memory but the overall address space is shared by all threads. Even per-thread stack space is shared by all threads -- just that each thread gets a different contiguous memory area.
But how it's share all these parameters within same address?
This is also true of the global variables, file descriptors, etc.. They are all shared.
Most thread implementations running under Linux use the clone(2) syscall to create new thread processes. To quote from the clone man page:
clone() creates a new process, in a manner similar to fork(2). It is actually a library function layered on top of the underlying clone() system call, hereinafter referred to as sys_clone. A description of sys_clone is given toward the end of this page.
Unlike fork(2), these calls allow the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers.
You can see the cloned processes by using ps -eLf under Linux.

Related

Is a coroutine a kind of thread that is managed by the user-program itself (rather than managed by the kernel)?

In my opinion,
Kernel is an alias for a running program whose program text is in the kernel area and can access all memory spaces;
Process is an alias for a running program whose program has an independent memory space in the user memory area. Which process can get the use of the CPU is completely managed by the kernel;
Thread is an alias for a running program whose program-text is in the memory space of a process and completely shares the memory space with another thread of the same process. Which thread can get the use of the CPU is completely managed by the kernel;
Coroutine is an alias for a running program whose program-text is in the memory space of a process.And it is a user thread that the process decides itself (not the kernel) how to use, and the kernel is only responsible for allocating CPU resources to the process.
Since the process itself has no right to schedule like the kernel, the coroutine can only be concurrent but not parallel.
Am I correct in saying Above?

process is an alias for a running program...
The modern way to think of a process is to think of it as a container for a collection of threads and the resources that those threads need to execute.
Every process (except for "zombie" processes that can exist in some systems) must have at least one thread. It also has a virtual address space, open file handles, and maybe sockets and other resources that are shared by the threads.
Thread is an alias for a running program...
The problem with saying that is, "running program" sounds too much like "process," and a thread is most definitely not a process. (E.g., a thread can only exist in a process.)
A computer scientist might tell you that a thread is one particular execution of the application's code. I like to think of a thread as an independent agent who executes the code.
coroutine...is a user thread...
I'm going to mostly leave that one alone. "Coroutine" seems to mean something different from the highly formalized, and not particularly useful coroutines that I learned about more than forty years ago. What people call "coroutines" today seem to have somewhat in common with what I call "green threads," but there are details of how and when and why they are used that I don't yet understand.
Green threads (a.k.a., "user mode threads") simply are threads that the kernel doesn't know about. They are pretty much just like the threads that the kernel does know about except, the kernel scheduler never preempts them because, Duh! it doesn't know about them. Context switches between green threads can only happen at specific points where the application allows it (e.g., by calling a yield() function or, by calling some library function that is a documented yield point.)
kernel is an alias for a running program...
The kernel also is most definitely not a process.
I don't know every detail about every operating system, but the bulk of kernel code does not run independently of the applications that the kernel serves. It only runs when an application thread enters kernel mode by making a system call. The thread that runs the kernel code still belongs to the application process, but the code that determines the thread's behavior at that point is written or chosen by the kernel developers, not by the application developers.

Dynamic variable declaration inside a thread

As I got to know that apart from data segment and code segment threads also share heap segment
What resources are shared between threads??
then if I create a variable dynamically using malloc() or calloc() inside the thread then does that variable would be accessible to all the other threads of the same process?

Theoretically, if you know the memory address. Yes, heap allocated variables should be accessible from any thread within the same process.
{malloc, calloc, realloc, free, posix_memalign} of glibc-2.2+ are thread safe
http://linux.derkeiler.com/Newsgroups/comp.os.linux.development.apps/2005-07/0323.html
Original post
Generally, malloc/new/free/delete on multi threaded systems are thread safe, so this should be no problem - and allocating in one thread , deallocating in another is a quite common thing to do.
As threads are an implementation feature, it certainly is implementation dependant though - e.g. some systems require you to link with a multi threaded runtime library.
And this
Besides also answered in: the link you posted
Threads differ from traditional multitasking operating system
processes in that:
processes are typically independent, while threads exist as subsets of
a process processes carry considerable state information, whereas
multiple threads within a process share state as well as memory and
other resources processes have separate address spaces, whereas
threads share their address space processes interact only through
system-provided inter-process communication mechanisms. Context
switching between threads in the same process is typically faster than
context switching between processes.
So, yes it is.

How can there be multiple call stacks allocated at the same time? How does the stack pointer change between threads?

Summary of my understanding:
The top memory addresses are used for the? (I initially thought there was only one call stack) stack, and the? stack grows downwards (What and where are the stack and heap?)
However, each thread gets it's own stack allocated, so there should be multiple call stacks in memory (https://stackoverflow.com/a/80113/2415178)
Applications can share threads (e.g, the key application is using the main thread), but several threads can be running at the same time.
There is a CPU register called sp that tracks the stack pointer, the current stack frame of a call stack.
So here's my confusion:
Do all of the call stacks necessary for an application (if this is even possible to know) get allocated when the application gets launched? Or do call stacks get allocated/de-allocated dynamically as applications spin off new threads? And if that is the case, (I know stacks have a fixed size), do the new stacks just get allocated right below the previous stacks-- So you would end up with a stack of stacks in the top addresses of memory? Or am I just fundamentally misunderstanding how call stacks are being created/used?
I am an OS X application developer, so my visual reference for how call stacks are created come from Xcode's stack debugger:
Now I realize that how things are here are more than likely unique to OS X, but I was hoping that conventions would be similar across operating systems.
It appears that each application can execute code on multiple threads, and even spin off new worker threads that belong to the application-- and every thread needs a call stack to keep track of the stack frames.
Which leads me to my last question:
How does the sp register work if there are multiple call stacks? Is it only used for the main call stack? (Presumably the top-most call stack in memory, and associated with the main thread of the OS) [https://stackoverflow.com/a/1213360/2415178]

Do all of the call stacks necessary for an application (if this is even possible to know) get allocated when the application gets launched?
No. Typically, each thread's stack is allocated when that thread is created.
Or do call stacks get allocated/de-allocated dynamically as applications spin off new threads?
Yes.
And if that is the case, (I know stacks have a fixed size), do the new stacks just get allocated right below the previous stacks-- So you would end up with a stack of stacks in the top addresses of memory? Or am I just fundamentally misunderstanding how call stacks are being created/used?
It varies. But the stack just has to be at the top of a large enough chunk of available address space in the memory map for that particular process. It doesn't have to be at the very top. If you need 1MB for the stack, and you have 1MB, you can just reserve that 1MB and have the stack start at the top of it.
How does the sp register work if there are multiple call stacks? Is it only used for the main call stack?
A CPU has as many register sets as threads that can run at a time. When the running thread is switched, the leaving thread's stack pointer is saved and the new thread's stack pointer is restored -- just like all other registers.
There is no "main thread of the OS". There are some kernel threads that do only kernel tasks, but also user-space threads also run in kernel space to run the OS code. Pure kernel threads have their own stacks somewhere in kernel memory. But just like normal threads, it doesn't have to be at the very top, the stack pointer just has to start at the highest address in the chunk used for that stack.

There is no such thing as the "main thread of the OS". Every process has its own set of threads, and those threads are specific to that process, not shared. Typically, at any given point in time, most threads on a system will be suspended awaiting input.
Every thread in a process has its own stack, which is allocated when the thread is created. Most operating systems will leave some space between each stack to allow them to grow if needed, and to prevent them from colliding with each other.
Every thread also has its own set of CPU registers, including a stack pointer (pointing to a location in that thread's stack).

Stack for threads of a process in Linux

How is stack space allocated (in the same address space) to each thread of a process in Linux or any other OS for that matter?

It depends on the type of thread library, a user space library like pthreads would allocate memory and divide it into thread stacks. On the OS side each thread would get a kernel stack.

On creation of new thread, the operating system reserves space in stack segment for current thread (parent), where the future auto variables and function call data of parent will live. Then, it allocates one guard page (this is to prevent the parent colliding into child stack, but this may vary with different operating systems). Once this is done, the stack frame for child thread is created (which is typically one-two page(s)).
This process is repeated in case the parent spawns multiple threads. All these stack frames live in stack segment of address space of process whose all these threads are part of.

Functions and variable space with threading using clone

I currently intend to implement threading using clone() and a question is, if I have all threads using the same memory space, with each function I call in a given thread, will each thread using a different part of memory when the same function is called, or do I do have todo something to ensure this happens?

Each thread will be using the same memory map overall but a different, separate thread-local stack for function calls. When different threads have called the same function (which itself lives at the same executable memory location), any local variables will not be shared, because they are allocated on the stack upon entry into the function / as needed, and each thread has its own stack by default.
References to any static/global memory (i.e., anything not allocated on the thread-local stack, such as globals or references to the heap or mmap memory regions passed to / visible to a thread after calling clone, and in general, the full memory map and process context itself) will, of course, be shared and subject to the usual host of multithreading issues (i.e., synchronization of shared state).
Note that you have to setup this thread-local stack space yourself before calling clone. From the manpage:
The child_stack argument specifies the location of the stack used by
the child process. Since the child and calling process may share
memory, it is not possible for the child process to execute in the
same stack as the calling process. The calling process must therefore
set up memory space for the child stack and pass a pointer to this
space to clone().
Stacks grow downward on all processors that run Linux (except the HP
PA processors), so child_stack usually points to the topmost address
of the memory space set up for the child stack.
The child_stack parameter is the second argument to clone. It's the responsibility of the caller (i.e., the parent thread) to ensure that each child thread created via clone receives a separate and non-overlapping chunk of memory for its stack.
Note that allocating and setting-up this thread-local stack memory region is not at all simple. Ensure that your allocations are page-aligned (start address is on a 4K boundary), a multiple of the page size (4K), amply-sized (if you only have a few threads, 2MB is safe), and ideally contains a "guard" section following the usable space. The stack guard is some number of pages with no access privileges-- no reading, no writing-- following the main stack region to guard the rest of the virtual memory address space should a thread dynamically exceed its stack size (e.g., with a bunch of recursion or functions with very large temporary buffers as local variables) and try to continue to grow into the stack guard region, which will fail-early as the thread will be served a SIGSEGV right away rather than insidiously corrupting. The stack guard is technically optional. You should probably be using mmap to allocate your stacks, although posix_memalign would do as well.
All that said, I've got to ask: why try to implement threading with clone to start? There are some very challenging problems here, and the POSIX threading library has solved them (in a portable way as well). If it's the fine-grained control of clone you want, then checkout the pthread_attr_* functions; they pretty much cover every non-obscure use case (such as allowing you to allocate your own stack if you like-- from the previous discussion, I'd think you wouldn't). The very performant, general Linux implementation, amongst other things, fully wraps clone and a large variety of other heinous system calls relevant to threading-- many of which do not even have C library wrappers and must be called via syscall. It all depends upon what you want to do.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string