when to use process v/s thread? - multithreading

I know the theoretical difference between the thread and process. But in practical when to use the thread and process because both will does the same work.

In general (and it varies by operating system):
Threads are usually lighter-weight than processes
Processes provide better isolation between actions
Threads provide simpler data sharing and coordination within the process
Typically the middle point is the kicker for me - if you really, really don't want two actions to interfere with each other, to the extent that one process going belly-up doesn't affect the other action, use separate processes. Otherwise I personally go for threads.
(I'm assuming that both models are available - if you want to run a separate executable, that's going to be pretty hard to do within an existing thread, at least in most environments I'm aware of.)

Thread is a subtotal of a process. Hereby the main difference is memory allocation and CPU time scheduling:
operating system handles memory per process and schedules execution time for processes
you allocate memory (within the bounds allowed per process) and you schedule execution time (within given execution timeframe per process) for threads
Other than that there's a lot of minor defining differences, like hardware allocation (threads can share hardware locked by their process), communication (depending on the platform/language/runtime, threads can share variables, processes need a pipe to share information) etc. There's much more in this distinction if you think of a thread as of an atomic entity, whilst process in that case would be the way to group these entities.

Related

fork vs thread on one single core

Imagine that I have two tasks, each of them needs 2 seconds to finish its job.
In this case, if I create two threads for each of them and my PC is single-core, this won't save any time. Am I right ?
What if I use fork to create two processes (the machine is still single-core) and each process takes charge of one task ? Can this save any time ?
If not, I have a question:
In current modern machine (including multi-core), if I have several heavy tasks, which method should I use ?
fork ?
thread ?
fork + thread, meaning that create some processes and
each process contains more than one thread ?
Even with a single core having two threads may speed up execution. If your routine is purely CPU bound then two threads won't improve anything, indeed the performance will be worse because of context switching overhead. But if the routine has to wait for memory, disk or or network (which is usually the case) then two threads will provide performance gains even with a single core.
About fork vs threads, threads require less resources so, in principle, should be the first choice. But there are two caveats: 1) maybe you want to be able to terminate a parallel routine, this is much safer to do with processes than with threads and 2) some languages (notably Python and Ruby) provide pseudo-thread libraries which do not use real threads but switch between routines using the same thread. This simulated threading can be very useful for example when waiting for network requests but it must be taken into account that it's not real multithreading.
Amendment: As commented by Sergio Tulentsev, Ruby and Python do indeed provide real threads and not only coroutines.
"job takes 2 seconds" - If those 2 seconds are fully occupying the CPU (100% load), you won't gain anything with either thread nor fork if you have no cores to share. The single-core CPU is simply busy and you cannnot make it more busy.
In case this 2 seconds include waiting time (for example on I/O, storage, whatever) you could gain something, even with a single core. The amount of gain depends on the CPU working vs. CPU waiting ratio and the overhead of your multiprocessing. Most non-trivial programs have at least some amount of "CPU waiting", so multithreading is often useful even on single-core CPUs.
This overhead for setting up a coroutine and context switching can be considerable and needs to be measured. Obviously, the shorter the run time of your actiual task is, the larger will be the ratio of overhead (for setting up a thread or process, etc.) and the smaller will be you multi-processing gain.
Traditionally, threads used to have considerably less overhead than processes (after all, that was why they were invented), but the "considerably" has maybe vanished over time - On modern Linux systems, processes are only a tad slower to set up than threads (actually, both use the same system calls). You rather decide between thread or process based on the requirements related to amount of protection (or sharing) of data than execution speed.

How do user level threads (ULTs) and kernel level threads (KLTs) differ with regards to concurrent execution?

Here's what I understand; please correct/add to it:
In pure ULTs, the multithreaded process itself does the thread scheduling. So, the kernel essentially does not notice the difference and considers it a single-thread process. If one thread makes a blocking system call, the entire process is blocked. Even on a multicore processor, only one thread of the process would running at a time, unless the process is blocked. I'm not sure how ULTs are much help though.
In pure KLTs, even if a thread is blocked, the kernel schedules another (ready) thread of the same process. (In case of pure KLTs, I'm assuming the kernel creates all the threads of the process.)
Also, using a combination of ULTs and KLTs, how are ULTs mapped into KLTs?
Your analysis is correct. The OS kernel has no knowledge of user-level threads. From its perspective, a process is an opaque black box that occasionally makes system calls. Consequently, if that program has 100,000 user-level threads but only one kernel thread, then the process can only one run user-level thread at a time because there is only one kernel-level thread associated with it. On the other hand, if a process has multiple kernel-level threads, then it can execute multiple commands in parallel if there is a multicore machine.
A common compromise between these is to have a program request some fixed number of kernel-level threads, then have its own thread scheduler divvy up the user-level threads onto these kernel-level threads as appropriate. That way, multiple ULTs can execute in parallel, and the program can have fine-grained control over how threads execute.
As for how this mapping works - there are a bunch of different schemes. You could imagine that the user program uses any one of multiple different scheduling systems. In fact, if you do this substitution:
Kernel thread <---> Processor core
User thread <---> Kernel thread
Then any scheme the OS could use to map kernel threads onto cores could also be used to map user-level threads onto kernel-level threads.
Hope this helps!
Before anything else, templatetypedef's answer is beautiful; I simply wanted to extend his response a little.
There is one area which I felt the need for expanding a little: combinations of ULT's and KLT's. To understand the importance (what Wikipedia labels hybrid threading), consider the following examples:
Consider a multi-threaded program (multiple KLT's) where there are more KLT's than available logical cores. In order to efficiently use every core, as you mentioned, you want the scheduler to switch out KLT's that are blocking with ones that in a ready state and not blocking. This ensures the core is reducing its amount of idle time. Unfortunately, switching KLT's is expensive for the scheduler and it consumes a relatively large amount of CPU time.
This is one area where hybrid threading can be helpful. Consider a multi-threaded program with multiple KLT's and ULT's. Just as templatetypedef noted, only one ULT can be running at one time for each KLT. If a ULT is blocking, we still want to switch it out for one which is not blocking. Fortunately, ULT's are much more lightweight than KLT's, in the sense that there less resources assigned to a ULT and they require no interaction with the kernel scheduler. Essentially, it is almost always quicker to switch out ULT's than it is to switch out KLT's. As a result, we are able to significantly reduce a cores idle time relative to the first example.
Now, of course, all of this depends on the threading library being used for implementing ULT's. There are two ways (which I can come up with) for "mapping" ULT's to KLT's.
A collection of ULT's for all KLT's
This situation is ideal on a shared memory system. There is essentially a "pool" of ULT's to which each KLT has access. Ideally, the threading library scheduler would assign ULT's to each KLT upon request as opposed to the KLT's accessing the pool individually. The later could cause race conditions or deadlocks if not implemented with locks or something similar.
A collection of ULT's for each KLT (Qthreads)
This situation is ideal on a distributed memory system. Each KLT would have a collection of ULT's to run. The draw back is that the user (or the threading library) would have to divide the ULT's between the KLT's. This could result in load imbalance since it is not guaranteed that all ULT's will have the same amount of work to complete and complete roughly the same amount of time. The solution to this is allowing for ULT migration; that is, migrating ULT's between KLT's.

Cost of a thread

I understand how to create a thread in my chosen language and I understand about mutexs, and the dangers of shared data e.t.c but I'm sure about how the O/S manages threads and the cost of each thread. I have a series of questions that all relate and the clearest way to show the limit of my understanding is probably via these questions.
What is the cost of spawning a thread? Is it worth even worrying about when designing software? One of the costs to creating a thread must be its own stack pointer and process counter, then space to copy all of the working registers to as it is moved on and off of a core by the scheduler, but what else?
Is the amount of stack available for one program split equally between threads of a process or on a first come first served?
Can I somehow check the hardware on start up (of the program) for number of cores. If I am running on a machine with N cores, should I keep the number of threads to N-1?
then space to copy all of the working registeres to as it is moved on
and off of a core by the scheduler, but what else?
One less evident cost is the strain imposed on the scheduler which may start to choke if it needs to juggle thousands of threads. The memory isn't really the issue. With the right tweaking you can get a "thread" to occupy very little memory, little more than its stack. This tweaking could be difficult (i.e. using clone(2) directly under linux etc) but it can be done.
Is the amount of stack available for one program split equally between
threads of a process or on a first come first served
Each thread gets its own stack, and typically you can control its size.
If I am running on a machine with N cores, should I keep the number of
threads to N-1
Checking the number of cores is easy, but environment-specific. However, limiting the number of threads to the number of cores only makes sense if your workload consists of CPU-intensive operations, with little I/O. If I/O is involved you may want to have many more threads than cores.
You should be as thoughtful as possible in everything you design and implement.
I know that a Java thread stack takes up about 1MB each time you create a thread. , so they add up.
Threads make sense for asynchronous tasks that allow long-running activities to happen without preventing all other users/processes from making progress.
Threads are managed by the operating system. There are lots of schemes, all under the control of the operating system (e.g. round robin, first come first served, etc.)
It makes perfect sense to me to assign one thread per core for some activities (e.g. computationally intensive calculations, graphics, math, etc.), but that need not be the deciding factor. One app I develop uses roughly 100 active threads in production; it's not a 100 core machine.
To add to the other excellent posts:
'What is the cost of spawning a thread? Is it worth even worrying about when designing software?'
It is if one of your design choices is doing such a thing often. A good way of avoiding this issue is to create threads once, at app startup, by using pools and/or app-lifetime threads dedicated to operations. Inter-thread signaling is much quicker than continual thread creation/termination/destruction and also much safer/easier.
The number of posts concerning problems with thread stopping, terminating, destroying, thread count runaway, OOM failure etc. is ledgendary. If you can avoid doing it at all, great.

Can thread creation within OS internals run concurrently?

Suppose we have a dual-core machine with a mainstream, modern OS capable to utilize both the cores.
If I have two threads, P1 and Q1 within the same process, and they happen to commence creating child threads, say, P2 and Q2, at approximately the same machine cycle, will OS perform the thread creation concurrently?
I heard thread creation is expensive, so the question came forth...
Thanks in advance.
Any reasonably well designed OS can have multiple processors executing kernel code at the same time. Therefore some of the tasks involved in a thread creation can be happening concurrently. But there will be some necessary serialization to manipulate some shared data structures (e.g. allocating memory, inserting a newly created threat structure into a global list). The processors could contend for the same lock thereby reducing concurrency.
Systems/applications which make new threads so often that the overhead of thread creation actually matters are probably designed wrong (doing too little useful work in a thread relative to the startup time, and not taking advantage of the obvious optimization of reusing short-lived threads from a pool).
It will be sorta-concurrently. There are aspects of thread-creation that cannot proceed in parallel - it would be unfortunate if the kernel memory-manager allocated both threads the same stack!
Thread creation is sufficiently expensive that it's worth while avoiding doing it at all during an app. run, hence the popularity of thread pools. Long-running tasks that block can be threaded off and left for the life of the app - often this means that explicit thread termination, (awkward at best, almost impossible at worst, from user code), is not necessary.
I think developers continually start and stop threads because they like to think of them as 'functions', where you 'pass parameters' in at the start and 'return' results when the thread ends. Ths is not the best way of conceptualizing threads.

Thread and Process

What is the best definition of a thread and what is a process?
If I call a function, how do I know that a thread is calling it or a process (or am I not understanding it??!). This is in a multi-core system (quadcore).
From http://wiki.answers.com/Q/What_is_the_difference_between_a_computer_process_and_thread:
A single process can have multiple threads that share global data and address space with other threads running in the same process, and therefore can operate on the same data set easily. Processes do not share address space and a different mechanism must be used if they are to share data.
If we consider running a word processing program to be a process, then the auto-save and spell check features that occur in the background are different threads of that process which are all operating on the same data set (your document).
One thing to add is how does a multi-core processor handle this. Think of a thread as the sequential execution of your code.
A core in a CPU can only execute one thread at a time. So if this thread is blocked because the program is waiting for an I/O operation to finish, the process is blocked (very simplified example: Word not responding). Multi-threading allows us to execute multiple code paths at the same time. "Same time" is a bit of a lie, since only one thread can actually execute at a time in a core, but the CPU gives some small chunk of time to each thread, so it appears as if all these threads are executing at the same time. A good example here is the spell checker in Word.
If you have multiple cores, the only difference is that in an N-Core CPU you can have N threads executing at the same time. To simplify a lot, it doesn't matter what process the threads belong to. To simply even further, you'd expect a N times performance increase. :-D
In every modern OS I know of, everything runs in a thread, which runs in a process.
The OS can keep track of multiple processes, and each process can host an arbitrary number of threads. So all code is executed within a thread and within a process (since the thread runs in a process).
The main distinction between the two is that each process has its own virtual address space. Separate processes do not have access to each others' data, file handles or anything else, and are essentially not aware that other processes exist.
On the other hand, every thread in a process share the same address space, and all threads can therefore inspect or modify each others' data, call the same functions and everything else.
It is often (but not always) the cases that one program consists of one process and a number of threads.
A process is composed of one or more threads (one by default for most environments). A process can create additional threads though.
Like the previous answer says, each Process has its own memory space (each can have a pointer to 0x12345, with that memory location having different values for each process), while all the Threads of a process would actually point to the exact same memory location, since they're all in the same memory space.
When calling a function, it's almost always called on the same thread that the caller is running on. In Objective-C, there are exceptions (performSelectorOnMainThread), and there might be for other languages as well, but that sort of functionality is necessary only in special cases.
From a user's point of view, the main distinction is that threads share memory with each other, while processes do not. That means you can easily share data between threads, while processes require some kind of OS call to do so.
Some call this a benifit of threads, but sharing data between multiple threads of control is fraught with danger, so it can be argued that processes lead to more reliable code.
There's a lot more to it, particularly if you are an OS person.

Resources