threads and LWP in Linux - linux

Is this sentence correct: "All threads in Linux are LWP but not all LWP are threads". Actually, I try to understand thread realisation in Linux. pthread_create call clone syscall, but in man clone, I didn't find any reference to LWP.
So, does Linux have LWP at all?

From this blog you can find your answer http://www.thegeekstuff.com/2013/11/linux-process-and-threads/
Threads in Linux are nothing but a flow of execution of the process. A
process containing multiple execution flows is known as multi-threaded
process.
For a non multi-threaded process there is only execution flow that is
the main execution flow and hence it is also known as single threaded
process. For Linux kernel , there is no concept of thread. Each thread
is viewed by kernel as a separate process but these processes are
somewhat different from other normal processes. I will explain the
difference in following paragraphs.
Threads are often mixed with the term Light Weight Processes or LWPs.
The reason dates back to those times when Linux supported threads at
user level only. This means that even a multi-threaded application was
viewed by kernel as a single process only. This posed big challenges
for the library that managed these user level threads because it had
to take care of cases that a thread execution did not hinder if any
other thread issued a blocking call.
Later on the implementation changed and processes were attached to
each thread so that kernel can take care of them. But, as discussed
earlier, Linux kernel does not see them as threads, each thread is
viewed as a process inside kernel. These processes are known as light
weight processes.
The main difference between a light weight process (LWP) and a normal
process is that LWPs share the same address space and other resources like
open files etc. As some resources are shared so these processes are
considered to be light weight as compared to other normal processes
and hence the name light weight processes.
So, effectively we can say that threads and light weight processes are
the same. It’s just that thread is a term that is used at user level while
light weight process is a term used at kernel level.
From implementation point of view, threads are created using functions
exposed by POSIX compliant pthread library in Linux. Internally, the
clone() function is used to create a normal as well as a light weight
process. This means that to create a normal process fork() is used
that further calls clone() with appropriate arguments while to create
a thread or LWP, a function from pthread library calls clone() with
relevant flags. So, the main difference is generated by using
different flags that can be passed to clone() function.
Read more about fork() and clone() on their respective man pages.

Related

Why NPTL threading in Linux still assignee unique PID to each thread?

I am reading pthread man and seeing following:
With NPTL, all of the threads in a process are placed in the same
thread group; all members of a thread group share the same PID.
My current architecture is running on NPTL 2.17 and when I run htop that is showing threads I see that all PIDs are unique. But why? I am expecting some of them (e.g. chrome) sharing same PID with each other?
See man gettid:
gettid() returns the caller's thread ID (TID). In a single-threaded
process, the thread ID is equal to the process ID (PID, as returned
by getpid(2)). In a multithreaded process, all threads have the same
PID, but each one has a unique TID. For further details, see the
discussion of CLONE_THREAD in clone(2).
What htop shows is TID, not PID. You can toggle display of the threads on/off with H key.
You can also enable PPID column in htop and that shows the PID / TID of the main thread for threads.
Google's documentation for Chromium (which probably operates similarly to Chrome when it comes to these concepts) states that they use a "multi-process architecture". Your quote from pthread's man page states that all of the threads in a single process are placed under the same PID, which would not apply to Chrome's architecture.
Because kernel-level threads are no more than processes with the (nearly) same address space.
This was "solved" by the linux kernel development by renaming them the processes to "threads", the "pid"-s to "tid"-s, and the old processes became "thread groups".
However, the sad truth is that if you create the thread on Linux (clone()), it will create a process - only using the (nearly) same memory segments.
That means 1:1 thread model. It means that all the threads are actually kernel-level threads, meaning that they are essentially processes in the same address space.
Some other alternatives would be:
1:M thread model. It means that the kernel doesn't know about threads, it is the task of the user-space libraries to make an "in-process multitasking" to run appearantly multi-threaded.
N:M thread model. This is best, unfortunately some opinion favorize still 1:1. It would mean that we have both user- and kernel-level threads and some optimization algorithm decides, what to run and where.
Once Linux had an N:M model (ngpt), but it was removed on a yet another fallback. It was that Linux kernel calls are inherently synchronous (blocking). Resulting that some kernel-cooperation had been needed even for user-space synchronization. Nobody wanted to do that.
So is it.
P.s. to create a well-performant app, you should actually avoid to create a lot of threads at once. You need to use a thread pool with well-thought locking protocols. If you don't minimize the usage of the thread creations/joins, your app will be slow and ineffective, it doesn't matter if it is N:M or not.
The Linux kernel does have the concept of POSIX pids (explorable in /proc/*) but it calls them thread group ids in the kernel source and it refers to its internal thread ids as pids (explorable in /proc/*/task/*).
I believe this is rooted in Linux's original treatment of threads as "just processes" that happen to share address spaces and a bunch of other stuff with each other.
Your user tool is likely propagating this perhaps confusing Linux kernel terminology.

How is a process and a thread the same thing in Linux?

I have read that a process and a thread are the same thing in Linux, for example in this question it says:
There is absolutely no difference between a thread and a process on
Linux.
But I don't understand how can a process and a thread means the same thing. I mean a thread is what gets executed by the CPU, and a process is simply an "enclosure" for the threads which allows the threads to have shared memory. This image shows the relationship between a process and its threads:
So clearly a process and a thread does not mean the same thing!
Linux didn't use to have special support for (POSIX) threads, and it simply treated them as processes that shared their address space as well as a few other resources (filedescriptors, signal actions, ...) with other "processes".
That implementation, while elegant, made certain things required for threads by POSIX difficult, so Linux did end up gaining that special support for threads and your premise is now no longer true.
Nevertheless, processes and threads still both remain represented as tasks within the kernel (but now the kernel has support for grouping those tasks into thread groups as well and APIs for working with those ((tgkill, tkill, exit_group, ...)).
You can google LinuxThreads and NPTL threads to learn more about the topic.

Mapping User-level threads and Kernel-level threads

How are User-level threads mapped to Kernel-level threads?
It varies by implementation. The three most common threading models are:
1-to-1: Each user-level thread has a corresponding entity that is scheduled by the kernel.
n-to-1: Each process is scheduled by the kernel. Thread scheduling takes place entirely in user space.
n-to-m: Each process has a pool of entities that are scheduled by the kernel. These are assigned to run particular user-level threads by a user-space scheduler that is part of the process.
Modern implementations are almost all 1-to-1.
There's a bit of confusion about the terminology used for referring to ULTs and KLTs.
Following are the two different interpretations. Please correct me if I got this wrong:
KLTs are needed to achieve concurrency in the kernel (Note the interpretation of Kernel as a Process or a live entity). This is true about Micro kernels like Symbian, where a kernel thread is responsible for every hardware resource of the system (e.g File Server, Location Server, Calendar Server, etc). However, in a kernel like Linux, which is mostly a library (and not a process or a living entity on its own), there's really no meaning for Kernel threads. In Linux, every thread you create is treated by the Kernel as a process and Kernel always runs either in the Process context or the Interrupt context.
Second interpretation is based on whether Threading (or concurrency) is visible to the Kernel or not. For instance, using setjmp, longjmp one can achieve concurrency at user space. Like already discussed, Kernel is totally unaware of this. This concurrency may be termed as ULT. And the thread whose creation the Kernel is aware of (one using Clone() system call) may be called KLT.

When is clone() and fork better than pthreads?

I am beginner in this area.
I have studied fork(), vfork(), clone() and pthreads.
I have noticed that pthread_create() will create a thread, which is less overhead than creating a new process with fork(). Additionally the thread will share file descriptors, memory, etc with parent process.
But when is fork() and clone() better than pthreads? Can you please explain it to me by giving real world example?
Thanks in Advance.
clone(2) is a Linux specific syscall mostly used to implement threads (in particular, it is used for pthread_create). With various arguments, clone can also have a fork(2)-like behavior. Very few people directly use clone, using the pthread library is more portable. You probably need to directly call clone(2) syscall only if you are implementing your own thread library - a competitor to Posix-threads - and this is very tricky (in particular because locking may require using futex(2) syscall in machine-tuned assembly-coded routines, see futex(7)). You don't want to directly use clone or futex because the pthreads are much simpler to use.
(The other pthread functions require some book-keeping to be done internally in libpthread.so after a clone during a pthread_create)
As Jonathon answered, processes have their own address space and file descriptor set. And a process can execute a new executable program with the execve syscall which basically initialize the address space, the stack and registers for starting a new program (but the file descriptors may be kept, unless using close-on-exec flag, e.g. thru O_CLOEXEC for open).
On Unix-like systems, all processes (except the very first process, usuallyinit, of pid 1) are created by fork (or variants like vfork; you could, but don't want to, use clone in such way as it behaves like fork).
(technically, on Linux, there are some few weird exceptions which you can ignore, notably kernel processes or threads and some rare kernel-initiated starting of processes like /sbin/hotplug ....)
The fork and execve syscalls are central to Unix process creation (with waitpid and related syscalls).
A multi-threaded process has several threads (usually created by pthread_create) all sharing the same address space and file descriptors. You use threads when you want to work in parallel on the same data within the same address space, but then you should care about synchronization and locking. Read a pthread tutorial for more.
I suggest you to read a good Unix programming book like Advanced Unix Programming and/or the (freely available) Advanced Linux Programming
The strength and weakness of fork (and company) is that they create a new process that's a clone of the existing process.
This is a weakness because, as you pointed out, creating a new process has a fair amount of overhead. It also means communication between the processes has to be done via some "approved" channel (pipes, sockets, files, shared-memory region, etc.)
This is a strength because it provides (much) greater isolation between the parent and the child. If, for example, a child process crashes, you can kill it and start another fairly easily. By contrast, if a child thread dies, killing it is problematic at best -- it's impossible to be certain what resources that thread held exclusively, so you can't clean up after it. Likewise, since all the threads in a process share a common address space, one thread that ran into a problem could overwrite data being used by all the other threads, so just killing that one thread wouldn't necessarily be enough to clean up the mess.
In other words, using threads is a little bit of a gamble. As long as your code is all clean, you can gain some efficiency by using multiple threads in a single process. Using multiple processes adds a bit of overhead, but can make your code quite a bit more robust, because it limits the damage a single problem can cause, and makes it much easy to shut down and replace a process if it does run into a major problem.
As far as concrete examples go, Apache might be a pretty good one. It will use multiple threads per process, but to limit the damage in case of problems (among other things), it limits the number of threads per process, and can/will spawn several separate processes running concurrently as well. On a decent server you might have, for example, 8 processes with 8 threads each. The large number of threads helps it service a large number of clients in a mostly I/O bound task, and breaking it up into processes means if a problem does arise, it doesn't suddenly become completely un-responsive, and can shut down and restart a process without losing a lot.
These are totally different things. fork() creates a new process. pthread_create() creates a new thread, which runs under the context of the same process.
Thread share the same virtual address space, memory (for good or for bad), set of open file descriptors, among other things.
Processes are (essentially) totally separate from each other and cannot modify each other.
You should read this question:
What is the difference between a process and a thread?
As for an example, if I am your shell (eg. bash), when you enter a command like ls, I am going to fork() a new process, and then exec() the ls executable. (And then I wait() on the child process, but that's getting out of scope.) This happens in an entire different address space, and if ls blows up, I don't care, because I am still executing in my own process.
On the other hand, say I am a math program, and I have been asked to multiply two 100x100 matrices. We know that matrix multiplication is an Embarrassingly Parallel problem. So, I have the matrices in memory. I spawn of N threads, who each operate on the same source matrices, putting their results in the appropriate location in the result matrix. Remember, these operate in the context of the same process, so I need to make sure they are not stamping on each other's data. If N is 8 and I have an eight-core CPU, I can effectively calculate each part of the matrix simultaneously.
Process creation mechanism on unix using fork() (and family) is very efficient.
Morever , most unix system doesnot support kernel level threads i.e thread is not entity recognized by kernel. Hence thread on such system cannot get benefit of CPU scheduling at kernel level. pthread library does that which is not kerenl rather some process itself.
Also on such system pthreads are implemented using vfork() and as light weight process only.
So using threading has no point except portability on such system.
As per my understanding Sun-solaris and windows has kernel level thread and linux family doesn't support kernel threads.
with processes pipes and unix doamin sockets are very efficient IPC without synchronization issues.
I hope it clears why and when thread should be used practically.

What is the difference between lightweight process and thread?

I found an answer to the question here. But I don't understand some ideas in the answer. For instance, lightweight process is said to share its logical address space with other processes. What does it mean? I can understand the same situation with 2 threads: both of them share one address space, so both of them can read any variables from bss segment (for example). But we've got a lot of different processes with different bss sections, and I don't know, how to share all of them.
I am not sure that answers are correct here, so let me post my version.
There is a difference between process - LWP (lightweight process) and user thread. I will leave process definition aside since that's more or less known and focus on LWP vs user threads.
LWP is what essentially are called today threads. Originally, user thread meant a thread that is managed by the application itself and the kernel does not know anything about it.
LWP, on the other hand, is a unit of scheduling and execution by the kernel.
Example:
Let's assume that system has 3 other processes running and scheduling is round-robin without priorities. And you have 1 processor/core.
Option 1. You have 2 user threads using one LWP. That means that from OS perspective you have ONE scheduling unit. Totally there are 4 LWP running (3 others + 1 yours). Your LWP gets 1/4 of total CPU time and since you have 2 user threads, each of them gets 1/8 of total CPU time (depends on your implementation)
Option2. You have 2 LWP. From OS perspective, you have TWO scheduling units. Totally there are 5 LWP running. Your LWP gets 1/5 of total CPU time EACH and your application get's 2/5 of CPU.
Another rough difference - LWP has pid (process id), user threads do not.
For some reason, naming got little messed and we refer to LWP as threads.
There are definitely more differences, but please, refer to slides.
http://www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt
EDIT:
After posting, I found a good article that explains everything in more details and is in better English than I write.
http://www.thegeekstuff.com/2013/11/linux-process-and-threads/
From MSDN, Threads and Processes:
Processes exist in the operating system and correspond to what users
see as programs or applications. A thread, on the other hand, exists
within a process. For this reason, threads are sometimes referred to
as light-weight processes. Each process consists of one or more
threads.
Based on Tanenbaum's book "Distributes Systems", light weight processes is generally referred to a hybrid form of user-level thread and kernel-level thread. An LWP runs in the context of a single process, and there can be several LWPs per process. In addition each LWP can be running its own (user-level) thread. Multi-threaded applications are constructed by creating threads (with thread library package), and subsequently assigning each thread to an LWP.
The biggest advantage of using this hybrid approach is that creating, destroying, and synchronizing threads is relatively cheap and do not need any kernel intervention. Beside that, provided that a process has enough LWPs, a blocking system call will not suspend the entire process.
Threads run within processes.
Each process may contain one or more threads.
If kernel doesn't know anything about the threads running in the process, we have threads running on user space and thus no multiprocessing capabilities are available.
On the other hand, we can have threads running on the kernel space; this means that each process runs on a different CPU. This enables us multiprocessing, but as you may assume it is more expensive in terms of operating system resources.
Finally, there is a solution that lies somewhere in the middle; we group threads together into LWP. Each group runs on different CPU, but the threads in the group cannot be multi processed. That's because kernel in this version knows only about the groups (which are multiprocessed) but nothing about the threads that they contain.
Hope it is clear enough.
IMO, LWP is a kernel thread binding which can be created and executed in the user context.
If I'm not mistaken, you can attach user threads to a single LWP to potentially increase the level of concurrency without involving a system call.
Thread is basically task assigned with one goal and enough information to perform a specific task .
A process can create multiple threads for doing its work as fast as possible.
e.g A portion of program may need input output, a portion may need permissions.
User level thread are those that can be handled by thread library.
On the other hand kernel level thread (which needs to deal with hadrware)are also called LWP(light weight process) to maximize the use of system and so the system does not halt upon just one system call.
From here.
Each LWP is a kernel resource in a kernel pool, and is attached and detached to a thread on a per thread basis. This happens as threads are scheduled or created and destroyed.
A process contains one or more threads in it and a thread can do anything a process can do. Also threads within a process share the same address space because of which cost of communication between threads is low as it is using the same code section, data section and OS resources, so these all features of thread makes it a "lightweight process".

Resources