Does every thread have its own main function?
I know that its have its own stack, but what about main function (not necessarily a function which called main).
For example, when creating a thread, we pass a function as an argument for the new thread to run it.
I tried to search about this topic, but couldn't find answers.
Quote from this article:
In a multi-threaded process, all of the process’ threads share the same memory and open files. Within the shared memory, each thread gets its own stack. Each thread has its own instruction pointer and registers. Since the memory is shared, it is important to note that there is no memory protection among the threads in a process.
Therefore, the «main» function could be called the function with which the execution of the thread begins, i.e. the address of the first instruction of which is initially loaded into the instruction pointer. It is worth noting that the first code that is executed in a thread can be a routine in the standard library that initializes and then calls a user-supplied function, which in this case can be called the «main».
But this is not a common term, it is usually called simply, a thread function.
However, there is a concept, the main thread. This is the first thread that is executed when the program (process) starts.
Related
I’m reading the popular Operating System Concepts book but I can’t get how the user threads are really scheduled. There’s particularly a statement that confuses me :
“User-level threads are managed by a thread library, and the kernel is unaware of them”.
Let’s say I create the process A and 3 threads with the Pthreads library. My assumption is that user threads must be necessarily scheduled by the kernel (OS). Isn’t the OS responsible for allocating CPU? Doesn’t threads have their own registers and stack? So there must be a context switch ( registers switch ) and therefore there also must be some handling by the OS. How can the kernel be unaware of them?
How are user threads exactly scheduled?
In the simplest implementation of user-level threads (a.k.a., "green threads"), there would be a function named yield(). The yield() function is the scheduler. When one thread calls it, it would;
Choose which thread should run next according to the scheduling policy. If it happens to choose the calling thread, it would then simply return. Otherwise, it would...
...Save whatever registers any called function in the high-level language is obliged to save in the saved context area for the calling thread. This would include, at a minimum, the stack pointer. It probably also would include a frame pointer, and maybe a few general purpose registers.
...Restore the registers from the saved context area for the chosen thread. This would include the stack pointer, and since we're changing the stack pointer in mid-stream, so to speak, we'll have to be very careful about where the scheduler's own variables are kept. It won't really work for it to have "local" variables in the usual sense. There's a good chance that at least part of this "context switch" code will have to be written in assembly language,
Finally, all it has to do is a normal function return, and it will be returning from a yield() call in the chosen thread instead of the original thread.
The function to create a new thread would;
Allocate a block of memory for the thread's stack,
Construct an artificial stack frame in the new stack that looks like the following function just called yield() from the line marked "// 1";
void root_function(void (*thread_function)(void* args)) {
yield(); // 1
thread_function(args);
mark_my_own_saved_context_as_dead();
yield(); // 2
}
When the thread_function() returns, the thread will call the mark_my_own_saved_context_as_dead() function to notify the scheduler algorithm that the thread is dead. When the thread called yield() for the last time ("// 2"), then the scheduler algorithm would free up its stack, and clean up whatever else needs to be cleaned up before selecting some other thread to run.
In a typical implementation of green threads, there will be many other places where one thread implicitly yields to another. Any blocking I/O call, for example, or a sleep(), or any call to acquire a mutex or a semaphore.
pthread_detach marks a thread so that when it terminates, its resources are automatically released without requiring the parent thread to call pthread_join. How can it do this? From the perspective of Linux in particular, there are two resources in particular I am curious about:
As an implementation detail, I would expect that if a wait system call is not performed on the terminated thread, then the thread would become a zombie. I assume that the pthread library's solution to this problem does not involve SIGCHLD, because (I think) it still works regardless of what action the program has specified to occur when SIGCHLD is received.
Threads are created using the clone system call. The caller must allocate memory to serve as the child thread's stack area before calling clone. Elsewhere on Stack Overflow, it was recommended that the caller use mmap to allocate the stack for the child. How can the stack be unmapped after the thread exits?
It seems to me that pthread_detach must somehow provide solutions to both of these problems, otherwise, a program that spawns and detaches many threads would eventually lose the ability to continue spawning new threads, even though the detached threads may have terminated already.
The pthreads library (on Linux, NPTL) provides a wrapper around lower-level primitives such as clone(2). When a thread is created with pthread_create, the function passed to clone is a wrapper function. That function allocates the stack and stores that information plus any other metadata into a structure, then calls the user-provided start function. When the user-provided start function returns, cleanup happens. Finally, an internal function called __exit_thread is called to make a system call to exit the thread.
When such a thread is detached, it still returns from the user-provided start function and calls the cleanup code as before, except the stack and metadata is freed as part of this since there is nobody waiting for this thread to complete. This would normally be handled by pthread_join.
If a thread is killed or exits without having run, then the cleanup is handled by the next pthread_create call, which will call any cleanup handlers yet to be run.
The reason a SIGCHLD is not sent to the parent nor is wait(2) required is because the CLONE_THREAD flag to clone(2) is used. The manual page says the following about this flag:
A new thread created with CLONE_THREAD has the same parent process as the process that made the clone call (i.e., like CLONE_PARENT), so that calls to getppid(2) return the same value for all of the threads in a thread group. When a CLONE_THREAD thread terminates, the thread that created it is not sent a SIGCHLD (or other termination) signal; nor can the status of such a thread be obtained using wait(2). (The thread is said to be detached.)
As you noted, this is required for the expected POSIX semantics to occur.
I had a doubt on using fork on a multi-threaded process.
If a process has multiple threads (already created using pthread_create and did a pthread_join) and I call fork, will it copy the same functions assigned to the threads in the child process or create a space where we can reassign the functions?
Read carefully what POSIX says about fork() and threads. In particular:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
The child process will have a single thread running in the context of the calling thread. Other parts of the original process may be tied up by threads that no longer exist (so mutexes may be locked, for example).
The rationale section (further down the linked page) says:
There are two reasons why POSIX programmers call fork(). One reason is to create a new thread of control within the same program (which was originally only possible in POSIX by creating a new process); the other is to create a new process running a different program. In the latter case, the call to fork() is soon followed by a call to one of the exec functions.
The general problem with making fork() work in a multi-threaded world is what to do with all of the threads. There are two alternatives. One is to copy all of the threads into the new process. This causes the programmer or implementation to deal with threads that are suspended on system calls or that might be about to execute system calls that should not be executed in the new process. The other alternative is to copy only the thread that calls fork(). This creates the difficulty that the state of process-local resources is usually held in process memory. If a thread that is not calling fork() holds a resource, that resource is never released in the child process because the thread whose job it is to release the resource does not exist in the child process.
When a programmer is writing a multi-threaded program, the first described use of fork(), creating new threads in the same program, is provided by the pthread_create() function. The fork() function is thus used only to run new programs, and the effects of calling functions that require certain resources between the call to fork() and the call to an exec function are undefined.
when creating a thread we pass an entry point method/function , why should I have this method , what is the purpose of it?
OS needs to know where a new thread of execution starts. When using a high-level programming language, one does not specify an address of machine instructions in memory to be executed in the context of a new thread, but uses execution units defined in the language like functions or methods. If thread creation worked like fork and execution of a new thread started at the point of fork invocation, then both threads would have the same local variables that usually reside in stack. Even if there is a copy of the stack created for a new thread, both threads will run the same clean-up code when leaving scopes (e.g., in C++ a smart pointer will be freed twice). So when you specify a starting point for a new thread, you are sure it will allocate a stack-frame of its own and function's epilog won't be executed twice.
A thread has to start somewhere. The pthread interface requires you to provide a function of the form
void *start_thread( void *arg );
void * is used because they can refer to anything.
When a thread is created, the function to provide is called as the thread's starting point. Think of it like main() for the thread, but with different argument and return types.
Anatomy of a Program in Memory states that the libraries (DLLs etc.) are mapped in the Memory-mapped segment of a process. Now, when a process runs and calls the function of a library, I believe that the program counter (PC) of the thread changes to the position of function's code in the memory mapped segment and then after execution is complete, returns to the code-segment. This makes sense if the function is synchronous because we waited for the function call to complete and then moved ahead in the code segment.
Now, consider an asynchronous programming model. The library say MySql.dll is loaded in memory-mapped segment and the main code calls an asynchronous function in the dll. Asynchronous function means that PC of thread moves ahead in the code and the thread gets a call-back when the called async procedure is completed. But, in this case the async procedure is within the address space of the thread. A thread can have only one PC which begins executing the function in the DLL. Therefore, the main program in the code-segment is stalled.
This leads me to believe that async programs are no good in single-threaded systems because the program can't move ahead till the async function completes. If more than one threads were allowed, the MySql.dll could spawn a new thread (which would have its own PC) and return control to the caller in the code-segment. The PC in code-segment would proceed ahead and thus, we could see some parallelization.
I know I am wrong somewhere because async programming in very much possible in single-threaded systems (eg: JavaScript). Therefore, I wanted to identify the fallacy in my above arguments. I have following doubts. These may/may not be the source of my confusion: -
Does every library share the address space with the linked process or has its own address space?
If a library has its own address space, that means it is a separate process. Does that mean calling a function in the library and library calling callback, involve IPC mechanisms?
EDIT:
The question above can be confusing. So, I am going to explain the main scenario here by using some notations.
A thread can have only one PC. Suppose, a single-threaded environment. Process P1 has thread T1. Say P1, refers a library L1 for an async function. L1 during loading would have been mapped to the memory-mapped segment of P1. Now, when the code in T1 calls the async function of L1, the PC (program counter) of T1 moves to the L1 segment to execute the async function. One PC can't be at two places. So, T1 doesn't proceed till async function finishes. Then, how does async benefit us in a single threaded environment?
"But, in this case the async procedure is within the address space of the thread"
think what you mean by that? A procedure, both sync and async, has several pointers: program counter points to the code which is always out of address range (not space) of the thread, and stack frame and stack top pointers always belong to the address range of the thread and are used only while procedure is running.
So from the address perspective, sync case is not different from the sync case.
And address space always belongs to a process, and not to a library or a thread. Libraries and threads each occupy parts (ranges) of the common address space - only so they can work together.
UPDT
"when the code in T1 calls the async function of L1, the PC (program counter) of T1 moves to the L1 segment to execute the async function" -
no, it does not. When PC moves, this is a sync call. Async call is to arrange a task which executes async procedure later. See https://en.wikipedia.org/wiki/Asynchronous_method_invocation