How OS understand what data for what thread? - multithreading

I can figure out how OS understand which data for which thread?
For example thread/process A send request to DB, then scheduler switch on process B. But when data is coming (answer from DB/FS or whatever) CPU run process C.
How OS understand who is owner of data? Where they are storing before OS will send them to proper thread?

An entire book is needed to answer your question. So read Operating Systems: three easy pieces.
You want to learn more about inter-process communication.
If using Linux, read also Advanced Linux Programming. See also the list of syscalls(2).
The operating system (often, its kernel) is managing data, virtual address spaces, and processes (and threads), so is bookkeeping a lot of meta data about them, notably their owner. The very notion of owner is provided by the OS (at the hardware level, that notion does not exist; but read about CPU modes). For Linux, see also credentials(7) & capabilities(7) & pthreads(7), etc...

Related

How an OS figures out an identity of a thread calling the GetCurrentThreadId()?

I'm trying to understand how an OS figures out what thread is a current one (for example, when the thread calls gettid() or GetCurrentThreadId()). Since a process address space is shared between all threads, keeping a thread id there is not an option. It must be something unique to each thread (i.e. stored in its context). If I was an OS developer, I would store it in some internal CPU register readable only in kernel mode. I googled a lot but haven't found any similar question (as if it was super obvious).
So how is it implemented in real operating systems like Linux or Windows?
You are looking for Thread Control Block(TCB).
It is a data structure that holds information about threads.
A light reading material can be found here about the topic:
https://www.cs.duke.edu/courses/fall09/cps110/slides/threads2.3.ppt
But I would recommend getting a copy of Modern Operating Systems by Andrew S. Tanenbaum if you are interested in OS.
Chapter 2 Section 2.2 Threads:
Implementing Threads in User Space - "When threads are managed in user space, each process needs its own private
thread table to keep track of the threads in that process."
Implementing threads in the Kernel - "The kernel has a thread table that keeps track
of all the threads in the system."
Just an edit you might also want to read "SCHEDULING". In a general manner you can say kernel decides which thread/process should be using the CPU.Thus kernel knows which thread/process made a system call. I am not going into detail because it depends on which OS we are talking about.
I believe this has already been very well explained in this question: how kernel distinguishes between thread and process
If you want to find out more, you can also google for the kernel task structure and see what info is stored about each type of processes running in the user space
The answer to your question is entirely system specific. However, most processors know nothing about threads. They only support processes. Threads are generally implemented by created separate processes that share the same address space.
When you do a system service call to get a thread ID it is going to be implemented in the same general fashion as system service to get the process id. Imagine how a get process ID function could work in a system that does not support threads. And to keep it simple, let's assume a single processor.
You are going to have some kind of data structure to represent the current process and the kernel is going to have some means of identifying the current process (e.g. a pointer in the kernel address space to that process). On some processors there is a current task register that points to a structure defined by the processor specification. An operating system can usually add its own data to the end of this structure.
So now I want to upgrade this operating system to support threads. To that I must have a data structure that describes the thread. In that structures I have a pointer to a structure that defines the process.
Then get thread ID works the same way get process ID worked before. But now Get Process ID has an additional step that I have to translate the thread to the process to get its id (which may even be included in the thread block).

Python multithreading model

I have been studying multithreading in python for a while, however I was confused on a few issues-
Firstly, are the threads created by the python threading library user level or kernel level threads?
Books say that user level threads must be mapped to kernel threads and
the operating system only creates and maintains kernel level threads.
Which thread model will be used in the python threading library? Further, who makes the choice between kernel and user level threads? Is it the operating system or can the programmer have a say?
If the many-to-one model (illustrated in the picture) is used, I think it is not real multithreading, since all the threads map to a single kernel thread.
Is there a way to direct the operating system to adhere to a certain threading model in my python program?
Can all running threads for a process be shown with their state separately marked as either kernel or user level. Also can the mappings between the two levels (user and kernel) be shown?
Usually, you never create 'kernel level threads' directly - everything you do in user space executes in user space, otherwise even a random browser JavaScript would be executing at the kernel level guaranteeing that within seconds the whole internet would go dark.
Thus, in most languages, a threading interface (if supported) is far removed from the actual 'kernel threads' and depending on implementation it will either link to a lower-level threading interface (pthreads for example) or just simulate threading unbeknownst to the user. Going down that chain, pthreads may or may not link to actual 'kernel' threads (it happens to be true on Linux, but on Windows there is another level of separation) but even then, the code executes in the user space - the 'supporting' kernel thread is there to control the scheduling the code runs separately.
When it comes to CPython, its threading interface links to pthreads so, technically, there is a chain from a Python thread all the way down to the kernel threads. However, Python also has the dreaded GIL pretty much guaranteeing that, with some rare exceptions mostly related to I/O, no two threads ever execute at the same time, pretty much making its threads operate in a cooperative multitasking mode. However, since on most systems processes are also backed by kernel threads, you can still utilize them in all their glory by using the multiprocessing interface.
Also, until you have multiple cores/CPUs on your system even kernel threads execute in a cooperative multitasking mode so, technically, kernel threads don't guarantee actual multi-threading as you're describing it.
As for how to list threads and their dependencies, you can use top -H -p <pid> to show the thread tree of a process.

Do all types of interprocess/interthread communication need system calls?

In Linux,
do all types of interprocess communication need system calls?
Types of interprocess communication are such as
Pipes
Signals
Message Queues
Semaphores
Shared Memory
Sockets
Do all types of interthread communication need system calls?
I would like to know if all interprocess communications and interthread communications involve switching from user mode to kernel mode so that the OS kernel will run to perform the communications? Since system calls all involve such switch, I asked if the communications need system calls.
For example, "Shared memory" can be used for both interprocess and interthread communcations, but i am not sure if it requires system calls or involvement of OS kernel to take over the cpu to perform something.
Thanks.
For interprocess communication I am pretty sure you cannot avoid system calls.
For interthread communication I cannot give you a definitive answer, but my educated guess would be "yes-and-no". You see, you can communicate between threads using thread-safe queues, and the only thing that a thread-safe queue needs in order to work is a lock. If a lock is unavailable at the moment that a thread wants to obtain it, then of course the system must be involved in order to put the thread in a waiting mode. But if the lock is available to obtain, then the thread should be able to proceed without the need for any system call.
That's what I would guess, and I would be quite disappointed to find out that things do not actually work this way, because that would mean that code which I have up until now been considering pretty innocent in fact has a tremendous additional hidden overhead.
Yes, every IPC was set by some syscalls(2).
It might happen that some IPC was set by a previous program (e.g. the program in the same process before execve), for example when running a pipeline like ls | ./yourprog it is the shell which has called pipe(2), not yourprog.
Since threads -in the same process- (by definition) share a common address space they can communicate using some shared data. However, they often need some syscall for synchronization (e.g. with mutexes), see e.g. futex(7) - because you want to avoid spinlocks (i.e. wasting CPU power for waiting). But in practice you should use pthreads(7)
In practice you cannot use shared memory (like shm_overview(7)) without synchronization (e.g. with semaphores, see sem_overview(7)). Notice that cache coherence is tricky and makes memory model sometimes non-intuitive (and processor specific).
At least, you do not need a system call for each read/write to shared memory. Setting up shared memory will for sure and synchronizing threads/processes will often involve system calls.
You could use flags in shared memory for synchronization, but note that read and write of flags may not be atomic actions.
(For example if you set up a location in shared memory to be 0 in the beginning and then check for it to be non-zero, while the other process sets it to non-zero when ready for something)

How do programs communicate with each other?

How do procceses communicate with each other? Using everything I've learnt to fo with programming so far, I'm unable to explain how sockets, file systems and other things to do with sending messages between programs work.
Btw I use a Linux based OS if your going to add anything OS specific. Thanks in advance. The question's been bugging me for ages. I'm also guessing the kernel has something to do with it.
In case of most IPC (InterProcess Communication) mechanisms, the general answer to your question is this: process A calls the kernel passing a pointer to a buffer with data to be transferred to process B, process B calls the kernel (or is already blocked on a call to the kernel) passing a pointer to a buffer to be filled with data from process A.
This general description is true for sockets, pipes, System V message queues, ordinary files etc. As you can see the cost of communication is high since it involves at least one context switch.
Signals constitute an asynchronous IPC mechanism in which one process can send a simple notification to another process triggering a handler registered by the second process (alternatively doing nothing, stopping or killing that process if no handler is registered, depending on the signal).
For transferring large amount of data one can use System V shared memory in which case two processes can access the same portion of main memory. Note that even in this case, one needs to employ a synchronization mechanism, like System V semaphores, which result in context switches as well.
This is why when processes need to communicate often, it is better to make them threads in a single process.

Is Pthread library actually a user thread solution?

The title might not be clear enough because I don't know how to define my questions actually.
I understand Pthread is a thread library meeting POSIX standard (about POSIX, see wikipedia: http://en.wikipedia.org/wiki/Posix). It is available in Unix-like OS.
About thread, I read that there are three different models:
User level thread: the kernel does not know it. User himself creates/implements/destroy threads.
Kernel level thread: kernel directly supports multiple threads of control in a process.
Light weight process(LWP): scheduled by kernel but can be bounded with user threads.
Did you see my confusion? When I call pthread_create() to create a thread, did I create a user level thread? I guess so. So can I say, Pthread offers a user level solution for threads? It can not manipulate kernel/LWP?
#paulsm4 I am doubtful about your comment that kernel knows every thing. In this particular context of user level threads, the kernel is unaware of the fact that such a thing is happening. A user level thread's scheduling is maintained by the user himself (via the interface provided by a library) and the kernel ends up allotting just a single kernel thread to the whole process. Kernel would treat the process as a single threaded and any blocking call by one of the threads would end up blocking all the threads of that process.
Refer to http://www.personal.kent.edu/~rmuhamma/OpSystems/Myos/threads.htm
In Linux, pthread is implemented as a lightweight process. Kernel (v2.6+) is actually implemented with NPTL. Let me quote the wiki content:
NPTL is a so-called 1×1 threads library, in that threads created by the user (via the pthread_create() library function) are in 1-1 correspondence with schedulable entities in the kernel (tasks, in the Linux case). This is the simplest possible threading implementation.
So pthread in linux kernel is actually implemented as kernel thread.
pthreads, per se, isn't really a threading library. pthreads is the interface which a specific threading library implements, using the concurrency resources available on that platform. So there's a pthreads implementation on linux, on bsd, on solaris, etc., and while the interface (the header files and the meaning of the calls) is the same, the implementation of each is different.
So what pthread_create actually does, in terms of kernel thread objects, varies between OSes and pthread library implementations. At a first approximation, you don't need to know (that's stuff that the pthread abstraction allows you to not need to know about). Eventually you might need to see "behind the curtain", but for most pthread users that's not necessary.
If you want to know what a /specific/ pthread implementation does, on a specific OS, you'll need to clarify your question. What Solaris and Linux do, for example, is very different.
Q: I understand Pthread is a thread library meeting POSIX standard
A: Yes. Actually, "Pthreads" stands for "Posix threads":
http://en.wikipedia.org/wiki/Pthreads
Q: It is available in Unix-like OS.
A: Actually, it's available for many different OSs ... including Windows, MacOS ... and, of course, Linux, BSD and Solaris.
Q: About thread, I read that there are three different models
Now you're getting fuzzy. "Threads" is a very generic term. There are many, many different models. And many, many different ways you can characterize and/or implement "threads". Including stuff like the Java threading model, or the Ada threading model.
Q: When I call pthread_create() to create a thread, did I create a
user level thread?
A: Yes: Just about everything you do in user space is "protected" in your own, private "user space".
Q: User level thread: the kernel does not know it.
A: No. The kernel knows everything :)
Q: Kernel level thread: kernel directly supports multiple threads of
control in a process.
A: Yes, there is such a thing as "kernel threads".
And, as it happens, Linux makes EXTENSIVE use of kernel threads. For example, every single process in a Linux system is a "kernel thread". And every user-created pthread is ALSO implemented as a new "kernel thread". As are "worker threads" (which are completely invisible to any user-level process).
But this is an advanced topic you do NOT need to understand in order to effectively use pthreads. Here's a great book that discussed this - and many other topics - in detail:
Linux Kernel Development, Robert Love
Remember: "Pthreads" is an interface. How it's implemented depends on the platform. Linux uses kernel threads; Windows uses Win32 threads, etc.
===========================================================================
ADDENDUM:
Since people still seem to be hitting this old thread, I thought it would be useful to reference this post:
https://stackoverflow.com/a/11255174/421195
Linux typically uses two implementations of pthreads:
LinuxThreads and Native
POSIX Thread Library(NPTL),
although the former is largely obsolete. Kernel from 2.6 provides
NPTL, which provides much closer conformance to SUSv3, and perform
better especially when there are many threads.
You can query the
specific implementation of pthreads under shell using command:
getconf GNU_LIBPTHREAD_VERSION
You can also get a more detailed implementation difference in The
Linux Programming Interface.
"Pthreads" is a library, based on the Posix standard. How a pthreads library is implemented will differ from platform to platform and library to library.
I find previous answers not as satisfying or clear as I would have liked so here goes:
When you call
pthread_create(...)
you always create a new user-level thread. And assuming that there is OS, there is always one or more kernel thread...but let's dive deeper:
According to "Operating system concepts" 10th edition,the actual classification we should be looking at (when it comes to thread libraries) is how the user level threads are mapped onto kernel threads (and that's what the question really meant).
The models are one to one (each user-level thread within a single process is mapped to a different kernel thread),many to one (the thread library is "user level" so all of the different threads within a single process are mapped to a single kernel thread,and the threads data structures, context switch etc are dealt with at user level and not by the OS [meaning that if a thread blocks on some I/O call, the entire process might potentially block]), and many to many (something in between,obviously the number of user-level threads is greater or equal to the number of kernel threads it is being mapped onto).
Now,pthreads is a specification and not an implementation, and the implementation does depend on the OS to which it is written. It could be any one of those models (notice that "many to many" is very flexible).
So,as an example,on Linux and Windows (the most popular OSs for years now,where the model is "one to one") the implementation is "one to one".
Pthreads is just a standardized interface for threading libraries. Whether an OS thread or a lightweight thread is created depends on the library you use. Nevertheless, my first guest would be that your threads are “real” OS-level threads.

Resources