Does multithreading actually work in uniprocessor environment

Does multithreading actually work in uniprocessor environment - multithreading

Suppose we have a process with multiple threads in a uniprocessor.
Now I know that if we have several processes, only one of them will be processed at a time in a uniprocessor and hence the processes are not concurrent.
If my understanding is correct, similarly each thread will be processed at a time and not concurrent in a uniprocessor. Is this statement true? If so then does multithreading mean having more than one thread in a process and does not mean running multiple threads at a time? And does that mean there's no benefit of creating user threads in a uniprocessor environment?

TL;DR: threads are switching more often than processes and in real time we have an effect of concurrency because it is happens really fast.
when you wrote:
each thread will be processed at a time and not concurrent in a uni processor
Notice the word "concurrent", there is no real concurrency in uni processor, there is only effect of that thanks to the multiple number of context switches between processes.
Let's clarify something here, the single core of the CPU can handle one thread at a given time, each process has a main thread and (if needed) more threads running together. If a process A is now running and it has 3 threads: A1(main thread), A2, A3 all three will be running as long as process A is being processed by the CPU core. When a context switch occur process A is no longer running and now process B will run with his threads.
About this statement:
there's no benefit of creating user threads in a uni processor environment
That is not true. there is a benefit in creating threads, they are easier to create ("spawn" as in the books) and shearing the process heap memory. Creating a sub process ("child" as in the books) is a overhead comparing to a thread because a process need to have his own memory. For example each google chrome tab is a process not a thread, but this tab has multiple threads running concurrency with little responsibility.

If you are still somehow running a computer with just one, single-core, CPU, then you would be correct to observe that only one thread can be physically executing at one time. But that does not negate the value of breaking up the application into multiple threads and/or processes.
The essential benefit is concurrency. When one thread is waiting (e.g. for an input/output operation to complete), there is something else for the CPU to be doing in the meantime: it can be running a different thread that isn't waiting. With a carefully designed application, you can get much better utilization of every part of the hardware, more parallelism, and thus, more throughput.
My favorite go-to example is a fast food restaurant. About a dozen workers, each one doing different things, cooperate to bring your order to you. Even if one of them (say, "the fry guy") is standing around, someone else always has something to do. Several orders are in-process at once. This overlap, this "concurrency," is what you are shooting for – regardless of how many CPUs you have.
Multithreading is also commonly used with GUI applications that also need to do some kind of "heavy lifting." One thread handles the GUI interaction (and has no other real responsibilities) while other threads, with a slightly inferior priority (or "niceness") do the lifting. When a GUI event comes in, the GUI thread pre-empts the others and responds to it immediately, then of course goes right back to sleep again. But in this way the GUI always remains very responsive – even though the other threads are doing "heavy lifting" things, GUI messages are still handled very promptly. (I scooped-up about a 25% performance improvement by re-tooling an older application to use this approach, because the application was no longer "polling" for GUI events.)

The first question I ask about any thread is, "what does it wait for?" To me, a thread is defined by what event it waits for and what it does when that event happens.
Threads were in wide-spread use for at least a decade before multi-processor computers became commercially available. They are useful when you want to write a program that has to respond to un-synchronized events that come from multiple different sources. There's a few different ways to model a program like that. One way is to have a different thread to wait on each different event source. The next most popular is an event driven architecture in which there's a main loop that waits for all events and calls different event handler functions for each of the different kinds of event.
The multi-threaded style of program often is easier to read* because there's usually different activities going on inside the program, and the state of each activity can be implicit in the context (i.e., registers and call stack) of the thread that's driving it, while in the event-driven model, each activity's state must be explicitly encoded in some object.
The implicit-in-the-context way of keeping the state is much closer to the procedural style of coding a single activity that we learn as beginners.
*Easier to read does not mean that the code is easy to write without making bad and non-obvious mistakes!!

The main impetus for developing threads was Ada compliance. Prior to that, different operating systems had their own ways of handing multiple things at once. In eunuchs, the way to do more than one thing was to spin off a new process. In VMS, software interrupts (aka Asynchronous System Traps or Asynchronous Procedure Calls in Windoze). In those days (1970's) multiprocessor systems were rare.
One of the goals of Ada was to have a system independent way of doing things. It adopted the "task" which is effectively a thread. In order to support Ada, compiler developers had to include task (thread) libraries.
With the rise of multiprocessors, operating systems started to make threads (rather than processes) the basic schedulable unit in a system.
Threads then give a way for programs to handle multiple things simultaneously, even if there is only one processor. Sadly, support for threads in programming languages has been woefully lacking. Ada is the only major language I can think of that has real support for threads (tasks). Thread support in Java, for example, is a complete, sick joke. The result is threads are not as effective in practice as they could be.

Related

Do Rust threads run at the same time in parallel? Documentation sounds like it does not [duplicate]

I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.

Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...
Threading:
When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.
So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?
Links relevant to threading in C:
SO Example
POSIX
ANSI C
Parallel Processing:
When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.
Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.
Links relevant to parallel processing in C:
OpenMP
More OpenMP (examples)
Gribble Labs - Introduction to OpenMP
CUDA Tookit from NVIDIA
Additional reading on the general topic of threading and architecture:
This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.

Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.

If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.

The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.

They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.

Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse

Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.

What's the point of multi-threading on a single core?

I've been playing with the Linux kernel recently and diving back into the days of OS courses from college.
Just like back then, I'm playing around with threads and the like. All this time I had been assuming that threads were automatically running concurrently on multiple cores but I've recently discovered that you actually have to explicitly code for handling multiple cores.
So what's the point of multi-threading on a single core? The only example I can think of is from college when writing a client/server program but that seems like a weak point.

All this time I had been assuming that threads were automatically
running concurrently on multiple cores but I've recently discovered
that you actually have to explicitly code for handling multiple cores.
The above is incorrect for any widely used, modern OS. All of Linux's schedulers, for example, will automatically schedule threads on different cores and even automatically move threads from one core to another when necessary to maximize core usage. There are some APIs that allow you to modify the schedulers' behavior, but these APIs are generally used to disable automatic thread-to-core scheduling, not to enable it.
So what's the point of multi-threading on a single core?
Imagine you have a GUI program whose purpose is to execute an expensive computation (for example, render a 3D image or a Mandelbrot set) and then display the result. Let's say this computation takes 30 seconds to complete on this particular CPU. If you implement that program the obvious way, and use only a single thread, then the user's GUI controls will be unresponsive for 30 seconds while the calculation is executing -- the user will be unable to do anything with your program, and possibly unable to do anything with his computer at all. Since users expect GUI controls to be responsive at all times, that would be a poor user experience.
If you implement that program with two threads (one GUI thread and one rendering thread), on the other hand, the user will be able to click buttons, resize the window, quit the program, choose menu items, etc, even while the computation is executing, because the OS is able to wake up the GUI thread and allow it to handle mouse/keyboard events when necessary.
Of course, it is possible to write this program with a single thread and keep its GUI responsive, by writing your single thread to do just a few milliseconds worth of computation, then check to see if there are GUI events available to process, handling them, then going back to do a bit more computation, etc. But if you code your app this way, you are essentially writing your own (very primitive) thread scheduler inside your app anyway, so why reinvent the wheel?
The first versions of MacOS were designed to run on a single core, but had no real concept of multithreading. This forced every application developer to correctly implement some manual thread management -- even if their app did not have any extended computations, they had to explicitly indicate when they were done using the CPU, e.g. by calling WaitNextEvent. This lack of multithreading made early (pre-MacOS-X) versions of MacOS famously unreliable at multitasking, since just one poorly written application could bring the whole computer to a grinding halt.

First, a program not only computes, but also waits for input/output and so can be considered as executing on an I/O processor. So even single-core machine is a multi-processor machine, and employing of multi-threading is justified.
Second, a task can be divided in several threads in the sake of modularity.

Multithreading is not only for taking advantage of multiple cores.
You need multiple processes for multitasking. For similar reason you are allowed to have multiple threads, which are lightweight compared with processes.
You probably don't want to spawn processes all the time for things like blocking I/O. That may be overkill.
And there is fiber, which is even more lightweight. So we have process, thread, and fiber for different levels of needs.

Well, when you say multithreading on a single core, there are things you need to consider. For example, the thread API that you are using - is it user level or kernel level. Most probably from you question I believe you are using user level threads.
Now, user level threads, depending upon the host OS or the API itself may map to single kernel thread or multiple. Many relations are possible like 1-1,many-1 or many-many.
Now, if there is a single core, your OS can still provide you several Kernel level threads which may behave as multiple processes to the CPU. In which case, OS will give you a time-slicing (and multi-programming) on the kernel threads leading to superfast context switch and via the user level API - you/your code will seem to have multithreaded features.
Also note that eventhough your processor is a single core, depending on the make, it can be hyperthreaded and have super deep pipelines allowing the concurrent running of Kernel threads with very low overhead.
For references: Check Intel/AMD architecture and how various OS provide Kernel threads.

Why are OS threads considered expensive?

There are many solutions geared toward implementing "user-space" threads. Be it golang.org goroutines, python's green threads, C#'s async, erlang's processes etc. The idea is to allow concurrent programming even with a single or limited number of threads.
What I don't understand is, why are the OS threads so expensive? As I see it, either way you have to save the stack of the task (OS thread, or userland thread), which is a few tens of kilobytes, and you need a scheduler to move between two tasks.
The OS provides both of this functions for free. Why should OS threads be more expensive than "green" threads? What's the reason for the assumed performance degradation caused by having a dedicated OS thread for each "task"?

I want to amend Tudors answer which is a good starting point. There are two main overheads of threads:
Starting and stopping them. Involves creating a stack and kernel objects. Involves kernel transitions and global kernel locks.
Keeping their stack around.
(1) is only a problem if you are creating and stopping them all the time. This is solved commonly using thread pools. I consider this problem to be practically solved. Scheduling a task on a thread pool usually does not involve a trip to the kernel which makes it very fast. The overhead is on the order of a few interlocked memory operations and a few allocations.
(2) This becomes important only if you have many threads (> 100 or so). In this case async IO is a means to get rid of the threads. I found that if you don't have insane amounts of threads synchronous IO including blocking is slightly faster than async IO (you read that right: sync IO is faster).

Saving the stack is trivial, no matter what its size - the stack pointer needs to be saved in the Thread Info Block in the kernel, (so usualy saving most of the registers as well since they will have been pushed by whatever soft/hard interrupt caused the OS to be entered).
One issue is that a protection level ring-cycle is required to enter the kernel from user. This is an essential, but annoying, overhead. Then the driver or system call has to do whatever was requested by the interrupt and then the scheduling/dispatching of threads onto processors. If this results in the preemption of a thread from one process by a thread from another, a load of extra process context has to be swapped as well. Even more overhead is added if the OS decides that a thread that is running on another processor core than the one handling the interrupt mut be preempted - the other core must be hardware-interrupted, (this is on top of the hard/soft interrupt that entred the OS in the first place.
So, a scheduling run may be quite a complex operation.
'Green threads' or 'fibers' are, (usually), scheduled from user code. A context-change is much easier and cheaper than an OS interrupt etc. because no Wagnerian ring-cycle is required on every context-change, process-context does not change and the OS thread running the green thread group does not change.
Since something-for-nothing does not exist, there are problems with green threads. They ar run by 'real' OS threads. This means that if one 'green' thread in a group run by one OS thread makes an OS call that blocks, all green threads in the group are blocked. This means that simple calls like sleep() have to be 'emulated' by a state-machine that yields to other green threads, (yes, just like re-implementing the OS). Similarly, any inter-thread signalling.
Also, of course, green threads cannot directly respond to IO signaling, so somewhat defeating the point of having any threads in the first place.

There are many solutions geared toward implementing "user-space" threads. Be it golang.org goroutines, python's green threads, C#'s async, erlang's processes etc. The idea is to allow concurrent programming even with a single or limited number of threads.
It's an abstraction layer. It's easier for many people to grasp this concept and use it more effectively in many scenarios. It's also easier for many machines (assuming a good abstraction), since the model moves from width to pull in many cases. With pthreads (as an example), you have all the control. With other threading models, the idea is to reuse threads, for the process of creating a concurrent task to be inexpensive, and to use a completely different threading model. It's far easier to digest this model; there's less to learn and measure, and the results are generally good.
What I don't understand is, why are the OS threads so expensive? As I see it, either way you have to save the stack of the task (OS thread, or userland thread), which is a few tens of kilobytes, and you need a scheduler to move between two tasks.
Creating a thread is expensive, and the stack requires memory. As well, if your process is using many threads, then context switching can kill performance. So lightweight threading models became useful for a number of reasons. Creating an OS thread became a good solution for medium to large tasks, ideally in low numbers. That's restrictive, and quite time consuming to maintain.
A task/thread pool/userland thread does not need to worry about much of the context switching or thread creation. It's often "reuse the resource when it becomes available, if it's not ready now -- also, determine the number of active threads for this machine".
More commmonly (IMO), OS level threads are expensive because they are not used correctly by the engineers - either there are too many and there is a ton of context switching, there is competition for the same set of resources, the tasks are too small. It takes much more time to understand how to use OS threads correctly, and how to apply that best to the context of a program's execution.
The OS provides both of this functions for free.
They're available, but they are not free. They are complex, and very important to good performance. When you create an OS thread, it's given time 'soon' -- all the process' time is divided among the threads. That's not the common case with user threads. The task is often enqueued when the resource is not available. This reduces context switching, memory, and the total number of threads which must be created. When the task exits, the thread is given another.
Consider this analogy of time distribution:
Assume you are at a casino. There are a number people who want cards.
You have a fixed number of dealers. There are fewer dealers than people who want cards.
There is not always enough cards for every person at any given time.
People need all cards to complete their game/hand. They return their cards to the dealer when their game/hand is complete.
How would you ask the dealers to distribute cards?
Under the OS scheduler, that would be based on (thread) priority. Every person would be given one card at a time (CPU time), and priority would be evaluated continually.
The people represent the task or thread's work. The cards represent time and resources. The dealers represent threads and resources.
How would you deal fastest if there were 2 dealers and 3 people? and if there were 5 dealers and 500 people? How could you minimize running out of cards to deal? With threads, adding cards and adding dealers is not a solution you can deliver 'on demand'. Adding CPUs is equivalent to adding dealers. Adding threads is equivalent to dealers dealing cards to more people at a time (increases context switching). There are a number of strategies to deal cards more quickly, especially after you eliminate the people's need for cards in a certain amount of time. Would it not be faster to go to a table and deal to a person or people until their game is complete if the dealer to people ratio were 1/50? Compare this to visiting every table based on priority, and coordinating visitation among all dealers (the OS approach). That's not to imply the OS is stupid -- it implies that creating an OS thread is an engineer adding more people and more tables, potentially more than the dealers can reasonably handle. Fortunately, the constraints may be lifted in many cases by using other multithreading models and higher abstractions.
Why should OS threads be more expensive than "green" threads? What's the reason for the assumed performance degradation caused by having a dedicated OS thread for each "task"?
If you developed a performance critical low level threading library (e.g. upon pthreads), you would recognize the importance of reuse (and implement it in your library as a model available for users). From that angle, the importance of higher level multithreading models is a simple and obvious solution/optimization based on real world usage as well as the ideal that the entry bar for adopting and effectively utilizing multithreading can be lowered.
It's not that they are expensive -- the lightweight threads' model and pool is a better solution for many problems, and a more appropriate abstraction for engineers who do not understand threads well. The complexity of multithreading is greatly simplified (and often more performant in real world usage) under this model. With OS threads, you do have more control, but several more considerations must be made to use them as effectively as possible -- heeding these consideration can dramatically reflow a program's execution/implementation. With higher level abstractions, many of these complexities are minimized by completely altering the flow of task execution (width vs pull).

The problem with starting kernel threads for each small task is that it incurs a non-negligible overhead to start and stop, coupled with the stack size it needs.
This is the first important point: thread pools exist so that you can recycle threads, in order to avoid wasting time starting them as well as wasting memory for their stacks.
Secondly, if you fire off threads to do asynchronous I/O, they will spend most of their time blocked waiting for the I/O to complete, thus effectively not doing any work and wasting memory. A much better option is to have a single worker handle multiple async calls (through some under-the-hood scheduling technique, such as multiplexing), thus again saving memory and time.
One thing that makes "green" threads faster than kernel threads is that they are user-space objects, managed by a virtual machine. Starting them is a user space call, while starting a thread is a kernel-space call that is much slower.

A person in Google shows an interesting approach.
According to him, kernel mode switching itself is not the bottleneck, and the core cost happen on SMP scheduler. And he claims M:N schedule assisted by kernel wouldn't be expensive, and this makes me to expect general M:N threading to be available on every languages.

Because the OS. Imagine that instead of asking you to clean the house your grandmother has to call the social service that does some paperwork and a week after assigns a social worker for helping her. The worker can be called off at any time and replaced with another one, which again takes several days.
That's pretty ineffective and slow, huh?
In this metaphor you are a userland coroutine scheduler, the social service is an OS with its kernel-level thread scheduler, and a social worker is a fully-fledged thread.

I think the two things are in different levels.
Thread or Process is an instance of the program which is being executed. In a process/thread there is much more things in it. Execution stack, opening files, signals, processors status, and a many other things.
Greentlet is different, it is runs in vm. It supplies a light-weight thread. Many of them supply a pseudo-concurrently (typically in a single or a few OS-level threads). And often they supply a lock-free method by data-transmission instead of data sharing.
So, the two things focus different, so the weight are different.
And In my mind, the greenlet should be finished in the VM not the OS.

Processes, threads, green threads, protothreads, fibers, coroutines: what's the difference?

I'm reading up on concurrency. I've got a bit over my head with terms that have confusingly similar definitions. Namely:
Processes
Threads
"Green threads"
Protothreads
Fibers
Coroutines
"Goroutines" in the Go language
My impression is that the distinctions rest on (1) whether truly parallel or multiplexed; (2) whether managed at the CPU, at the OS, or in the program; and (3..5) a few other things I can't identify.
Is there a succinct and unambiguous guide to the differences between these approaches to parallelism?

OK, I'm going to do my best. There are caveats everywhere, but I'm going to do my best to give my understanding of these terms and references to something that approximates the definition I've given.
Process: OS-managed (possibly) truly concurrent, at least in the presence of suitable hardware support. Exist within their own address space.
Thread: OS-managed, within the same address space as the parent and all its other threads. Possibly truly concurrent, and multi-tasking is pre-emptive.
Green Thread: These are user-space projections of the same concept as threads, but are not OS-managed. Probably not truly concurrent, except in the sense that there may be multiple worker threads or processes giving them CPU time concurrently, so probably best to consider this as interleaved or multiplexed.
Protothreads: I couldn't really tease a definition out of these. I think they are interleaved and program-managed, but don't take my word for it. My sense was that they are essentially an application-specific implementation of the same kind of "green threads" model, with appropriate modification for the application domain.
Fibers: OS-managed. Exactly threads, except co-operatively multitasking, and hence not truly concurrent.
Coroutines: Exactly fibers, except not OS-managed.
Goroutines: They claim to be unlike anything else, but they seem to be exactly green threads, as in, process-managed in a single address space and multiplexed onto system threads. Perhaps somebody with more knowledge of Go can cut through the marketing material.
It's also worth noting that there are other understandings in concurrency theory of the term "process", in the process calculus sense. This definition is orthogonal to those above, but I just thought it worth mentioning so that no confusion arises should you see process used in that sense somewhere.
Also, be aware of the difference between parallel and concurrent. It's possible you were using the former in your question where I think you meant the latter.

I mostly agree with Gian's answer, but I have different interpretations of a few concurrency primitives. Note that these terms are often used inconsistently by different authors. These are my favorite definitions (hopefully not too far from the modern consensus).
Process:
OS-managed
Each has its own virtual address space
Can be interrupted (preempted) by the system to allow another process to run
Can run in parallel with other processes on different processors
The memory overhead of processes is high (includes virtual memory tables, open file handles, etc)
The time overhead for creating and context switching between processes is relatively high
Threads:
OS-managed
Each is "contained" within some particular process
All threads in the same process share the same virtual address space
Can be interrupted by the system to allow another thread to run
Can run in parallel with other threads on different processors
The memory and time overheads associated with threads are smaller than processes, but still non-trivial
(For example, typically context switching involves entering the kernel and invoking the system scheduler.)
Cooperative Threads:
May or may not be OS-managed
Each is "contained" within some particular process
In some implementations, each is "contained" within some particular OS thread
Cannot be interrupted by the system to allow a cooperative peer to run
(The containing process/thread can still be interrupted, of course)
Must invoke a special yield primitive to allow peer cooperative threads to run
Generally cannot be run in parallel with cooperative peers
(Though some people think it's possible: http://ocm.dreamhosters.com/.)
There are lots of variations on the cooperative thread theme that go by different names:
Fibers
Green threads
Protothreads
User-level threads (user-level threads can be interruptable/preemptive, but that's a relatively unusual combination)
Some implementations of cooperative threads use techniques like split/segmented stacks or even individually heap-allocating every call frame to reduce the memory overhead associated with pre-allocating a large chunk of memory for the stack
Depending on the implementation, calling a blocking syscall (like reading from the network or sleeping) will either cause a whole group of cooperative threads to block or implicitly cause the calling thread to yield
Coroutines:
Some people use "coroutine" and "cooperative thread" more or less synonymously
I do not prefer this usage
Some coroutine implementations are actually "shallow" cooperative threads; yield can only be invoked by the "coroutine entry procedure"
The shallow (or semi-coroutine) version is easier to implement than threads, because each coroutine does not need a complete stack (just one frame for the entry procedure)
Often coroutine frameworks have yield primitives that require the invoker to explicitly state which coroutine control should transfer to
Generators:
Restricted (shallow) coroutines
yield can only return control back to whichever code invoked the generator
Goroutines:
An odd hybrid of cooperative and OS threads
Cannot be interrupted (like cooperative threads)
Can run in parallel on a language runtime-managed pool of OS threads
Event handlers:
Procedures/methods that are invoked by an event dispatcher in response to some action happening
Very popular for user interface programming
Require little to no language/system support; can be implemented in a library
At most one event handler can be running at a time; the dispatcher must wait for a handler to finish (return) before starting the next
Makes synchronization relatively simple; different handler executions never overlap in time
Implementing complex tasks with event handlers tends to lead to "inverted control flow"/"stack ripping"
Tasks:
Units of work that are doled out by a manager to a pool of workers
The workers can be threads, processes or machines
Of course the kind of worker a task library uses has a significant impact on how one implements the tasks
In this list of inconsistently and confusingly used terminology, "task" takes the crown. Particularly in the embedded systems community, "task" is sometimes used to mean "process", "thread" or "event handler" (usually called an "interrupt service routine"). It is also sometimes used generically/informally to refer to any kind of unit of computation.
One pet peeve that I can't stop myself from airing: I dislike the use of the phrase "true concurrency" for "processor parallelism". It's quite common, but I think it leads to much confusion.
For most applications, I think task-based frameworks are best for parallelization. Most of the popular ones (Intel's TBB, Apple's GCD, Microsoft's TPL & PPL) use threads as workers. I wish there were some good alternatives that used processes, but I'm not aware of any.
If you're interested in concurrency (as opposed to processor parallelism), event handlers are the safest way to go. Cooperative threads are an interesting alternative, but a bit of a wild west. Please do not use threads for concurrency if you care about the reliability and robustness of your software.

Protothreads are just a switch case implementation that acts like a state machine but makes implementation of the software a whole lot simpler. It is based around idea of saving a and int value before a case label and returning and then getting back to the point after the case by reading back that variable and using switch to figure out where to continue. So protothread are a sequential implementation of a state machine.

Protothreads are great when implementing sequential state machines. Protothreads are not really threads at all, but rather a syntax abstraction that makes it much easier to write a switch/case state machine that has to switch states sequentially (from one to the next etc..).
I have used protothreads to implement asynchronous io: http://martinschroder.se/asynchronous-io-using-protothreads/

Threads & Processes Vs MultiThreading & Multi-Core/MultiProcessor : How they are mapped?

I was very confused but the following thread cleared my doubts:
Multiprocessing, Multithreading,HyperThreading, Multi-core
But it addresses the queries from the hardware point of view. I want to know how these hardware features are mapped to software?
One thing that is obvious is that there is no difference between MultiProcessor(=Mutlicpu) and MultiCore other than that in multicore all cpus reside on one chip(die) where as in Multiprocessor all cpus are on their own chips & connected together.
So, mutlicore/multiprocessor systems are capable of executing multiple processes (firefox,mediaplayer,googletalk) at the "sametime" (unlike context switching these processes on a single processor system) Right?
If it correct. I'm clear so far. But the confusion arises when multithreading comes into picture.
MultiThreading "is for" parallel processing. right?
What are elements that are involved in multithreading inside cpu? diagram? For me to exploit the power of parallel processing of two independent tasks, what should be the requriements of CPU?
When people say context switching of threads. I don't really get it. because if its context switching of threads then its not parallel processing. the threads must be executed "scrictly simultaneously". right?
My notion of multithreading is that:
Considering a system with single cpu. when process is context switched to firefox. (suppose) each tab of firefox is a thread and all the threads are executing strictly at the same time. Not like one thread has executed for sometime then again another thread has taken until the context switch time is arrived.
What happens if I run a multithreaded software on a processor which can't handle threads? I mean how does the cpu handle such software?
If everything is good so far, now question is HOW MANY THREADS? It must be limited by hardware, I guess? If hardware can support only 2 threads and I start 10 threads in my process. How would cpu handle it? Pros/Cons? From software engineering point of view, while developing a software that will be used by the users in wide variety of systems, Then how would I decide should I go for multithreading? if so, how many threads?

First, try to understand the concept of 'process' and 'thread'. A thread is a basic unit for execution: a thread is scheduled by operating system and executed by CPU. A process is a sort of container that holds multiple threads.
Yes, either multi-processing or multi-threading is for parallel processing. More precisely, to exploit thread-level parallelism.
Okay, multi-threading could mean hardware multi-threading (one example is HyperThreading). But, I assume that you just say multithreading in software. In this sense, CPU should support context switching.
Context switching is needed to implement multi-tasking even in a physically single core by time division.
Say there are two physical cores and four very busy threads. In this case, two threads are just waiting until they will get the chance to use CPU. Read some articles related to preemptive OS scheduling.
The number of thread that can physically run in concurrent is just identical to # of logical processors. You are asking a general thread scheduling problem in OS literature such as round-robin..
I strongly suggest you to study basics of operating system first. Then move on multithreading issues. It seems like you're still unclear for the key concepts such as context switching and scheduling. It will take a couple of month, but if you really want to be an expert in computer software, then you should know such very basic concepts. Please take whatever OS books and lecture slides.

Threads running on the same core are not technically parallel. They only appear to be executed in parallel, as the CPU switches between them very fast (for us, humans). This switch is what is called context switch.
Now, threads executing on different cores are executed in parallel.
Most modern CPUs have a number of cores, however, most modern OSes (windows, linux and friends) usually execute much larger number of threads, which still causes context switches.
Even if no user program is executed, still OS itself performs context switches for maintanance work.
This should answer 1-3.
About 4: basically, every processor can work with threads. it is much more a characteristic of operating system. Thread is basically: memory (optional), stack and registers, once those are replaced you are in another thread.
5: the number of threads is pretty high and is limited by OS. Usually it is higher than regular programmer can successfully handle :)
The number of threads is dictated by your program:
is it IO bound?
can the task be divided into a number of smaller tasks?
how small is the task? the task can be too small to make it worth to spawn threads at all.
synchronization: if extensive synhronization is required, the penalty might be too heavy and you should reduce the number of threads.

Multiple threads are separate 'chains' of commands within one process. From CPU point of view threads are more or less like processes. Each thread has its own set of registers and its own stack.
The reason why you can have more threads than CPUs is that most threads don't need CPU all the time. Thread can be waiting for user input, downloading something from the web or writing to disk. While it is doing that, it does not need CPU, so CPU is free to execute other threads.
In your example, each tab of Firefox probably can even have several threads. Or they can share some threads. You need one for downloading, one for rendering, one for message loop (user input), and perhaps one to run Javascript. You cannot easily combine them because while you download you still need to react to user's input. However, download thread is sleeping most of the time, and even when it's downloading it needs CPU only occasionally, and message loop thread only wakes up when you press a button.
If you go to task manager you'll see that despite all these threads your CPU use is still quite low.
Of course if all your threads do some number-crunching tasks, then you shouldn't create too many of them as you get no performance benefit (though there may be architectural benefits!).
However, if they are mainly I/O bound then create as many threads as your architecture dictates. It's hard to give advice without knowing your particular task.

Broadly speaking, yeah, but "parallel" can mean different things.
It depends what tasks you want to run in parallel.
Not necessarily. Some (indeed most) threads spend a lot of time doing nothing. Might as well switch away from them to a thread that wants to do something.
The OS handles thread switching. It will delegate to different cores if it wants to. If there's only one core it'll divide time between the different threads and processes.
The number of threads is limited by software and hardware. Threads consume processor and memory in varying degrees depending on what they're doing. The thread management software may impose its own limits as well.

The key thing to remember is the separation between logical/virtual parallelism and real/hardware parallelism. With your average OS, a system call is performed to spawn a new thread. What actually happens (whether it is mapped to a different core, a different hardware thread on the same core, or queued into the pool of software threads) is up to the OS.
Parallel processing uses all the methods not just multi-threading.
Generally speaking, if you want to have real parallel processing, you need to perform it in hardware. Take the example of the Niagara, it has up to 8-cores each capable of executing 4-threads in hardware.
Context switching is needed when there are more threads than is capable of being executed in parallel in hardware. Even then, when executed in series (switching between one thread to the next), they are considered concurrent because there is no guarantee on the order of switching. So, it may go T0, T1, T2, T1, T3, T0, T2 and so on. For all intents and purposes, the threads are parallel.
Time slicing.
That would be up to the OS.

Multithreading is the execution of more than one thread at a time. It can happen both on single core processors and the multicore processor systems. For single processor systems, context switching effects it. Look!Context switching in this computational environment refers to time slicing by the operating system. Therefore do not get confused. The operating system is the one that controls the execution of other programs. It allows one program to execute in the CPU at a time. But the frequency at which the threads are switched in and out of the CPU determines the transparency of parallelism exhibited by the system.
For multicore environment,multithreading occurs when each core executes a thread.Though,in multicore again,context switching can occur in the individual cores.

I think answers so far are pretty much to the point and give you a good basic context. In essence, say you have quad core processor, but each core is capable of executing 2 simultaneous threads.
Note, that there is only slight (or no) increase of speed if you are running 2 simultaneous threads on 1 core versus you run 1st thread and then 2nd thread vertically. However, each physical core adds speed to your general workflow.
Now, say you have a process running on your OS that has multiple threads (i.e. needs to run multiple things in "parallel") and has some kind of stack of tasks in a queue (or some other system with priority rules). Then software sends tasks to a queue and your processor attempts to execute them as fast as it can. Now you have 2 cases:
If a software supports multiprocessing, then tasks will be sent to any available processor (that is not doing anything or simply finished doing some other job and job send from your software is 1st in a queue).
If your software does not support multiprocessing, then all of your jobs will be done in a similar manner, but only by one of your cores.
I suggest reading Wikipedia page on thread. Very first picture there already gives you a nice insight. :)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string