Preemptive threads Vs Non Preemptive threads - linux

Can someone please explain the difference between preemptive Threading model and Non Preemptive threading model?
As per my understanding:
Non Preemptive threading model: Once a thread is started it cannot be stopped or the control cannot be transferred to other threads until the thread has completed its task.
Preemptive Threading Model: The runtime is allowed to step in and hand control from one thread to another at any time. Higher priority threads are given precedence over Lower priority threads.
Can someone please:
Explain if the understanding is correct.
Explain the advantages and disadvantages of both models.
An example of when to use what will be really helpful.
If i create a thread in Linux (system v or Pthread) without mentioning any options(are there any??) by default the threading model used is preemptive threading model?

No, your understanding isn't entirely correct. Non-preemptive (aka cooperative) threads typically manually yield control to let other threads run before they finish (though it is up to that thread to call yield() (or whatever) to make that happen.
Preempting threading is simpler. Cooperative threads have less overhead.
Normally use preemptive. If you find your design has a lot of thread-switching overhead, cooperative threads would be a possible optimization. In many (most?) situations, this will be a fairly large investment with minimal payoff though.
Yes, by default you'd get preemptive threading, though if you look around for the CThreads package, it supports cooperative threading. Few enough people (now) want cooperative threads that I'm not sure it's been updated within the last decade though...

Non-preemptive threads are also called cooperative threads. An example of these is POE (Perl). Another example is classic Mac OS (before OS X). Cooperative threads have exclusive use of the CPU until they give it up. The scheduler then picks another thread to run.
Preemptive threads can voluntarily give up the CPU just like cooperative ones, but when they don't, it will be taken from them, and the scheduler will start another thread. POSIX & SysV threads fall in this category.
Big advantages of cooperative threads are greater efficiency (on single-core machines, at least) and easier handling of concurrency: it only exists when you yield control, so locking isn't required.
Big advantages of preemptive threads are better fault tolerance: a single thread failing to yield doesn't stop all other threads from executing. Also normally works better on multi-core machines, since multiple threads execute at once. Finally, you don't have to worry about making sure you're constantly yielding. That can be really annoying inside, e.g., a heavy number crunching loop.
You can mix them, of course. A single preemptive thread can have many cooperative threads running inside it.

If you use non-preemptive it does not mean that process doesn't perform context switching when the process is waiting for I/O. The dispatcher will choose another process according to the scheduling model. We have to trust the process.
non-preemptive:
less context switching, less overhead that can be sensible in non-preemptive model
Easier to handle since it can be handled using a single-core processor
preemptive:
Advantage:
In this model, we have a priority that helps us to have more control over the running process
Better concurrency is a bonus
Handling system calls without blocking the entire system
Disadvantage:
Requires more complex algorithms for synchronization and critical section handling is inevitable.
The overhead that comes with it

In cooperative (non-preemptive) models, once a thread is given control it continues to run until it explicitly yields control or it blocks.
In a preemptive model, the virtual machine is allowed to step in and hand control from one thread to another at any time. Both models have their advantages and disadvantages.
Java threads are generally preemptive between priorities. A higher priority thread takes precedence over a lower priority thread. If a higher priority thread goes to sleep or blocks, then a lower priority thread can run (assuming one is available and ready to run).
However, as soon as the higher priority thread wakes up or unblocks, it will interrupt the lower priority thread and run until it finishes, blocks again, or is preempted by an even higher priority thread.
The Java Language Specification, occasionally allows the VMs to run lower priority threads instead of a runnable higher priority thread, but in practice this is unusual.
However, nothing in the Java Language Specification specifies what is supposed to happen with equal priority threads. On some systems these threads will be time-sliced and the runtime will allot a certain amount of time to a thread. When that time is up, the runtime preempts the running thread and switches to the next thread with the same priority.
On other systems, a running thread will not be preempted in favor of a thread with the same priority. It will continue to run until it blocks, explicitly yields control, or is preempted by a higher priority thread.
As for the advantages both derobert and pooria have highlighted them quite clearly.

Related

Are coroutines single-threaded by nature?

According to Wikipedia, coroutines are based on cooperative multitasking, which makes them less resource-hungry than threads. No context switch, no blocking, no expensive system calls, no critical sections and so on.
In other words, all those coroutine benefits seem to come from disallowing multithreading in the first place. This makes coroutines single-threaded by nature: concurrency is achieved, but no true parallelism.
Is it true? Is it possible to implement coroutines by using multiple threads instead?
Coroutines allow multitasking without multithreading, but they don't disallow multithreading.
In languages that support both, a coroutine that is put to sleep can be re-awakened in a different thread.
The usual arrangement for CPU-bound tasks is to have a thread pool with about twice as many threads as you have CPU cores. This thread pool is then used to execute maybe thousands of coroutines simultaneously. The threads share a queue of coroutines ready to execute, and whenever a thread's current coroutine blocks, it just gets another one to work on from the queue.
In this situation you have enough busy threads to keep your CPU busy, and you still have thread context switches, but not enough of them to waste significant resources. The number of coroutine context switches is thousands of times higher.
Multiple coroutines can be mapped to a single OS thread. But a single OS thread can only utilize 1 CPU. So you need multiple OS threads to utilize multiple CPUs.
So if a coroutine scheduler needs to utilize multiple CPUs (very likely), it needs to make use of multiple OS threads.
Have a look at the Go scheduler and look for MN scheduler.

Benefits of user-level threads

I was looking at the differences between user-level threads and kernel-level threads, which I basically understood.
What's not clear to me is the point of implementing user-level threads at all.
If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?
I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).
This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?
An answer to a previously asked question (similar to mine) was something like:
No modern operating system actually maps n user-level threads to 1
kernel-level thread.
But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.
Could you help me understand this, please?
I strongly recommend Modern Operating Systems 4th Edition by Andrew S. Tanenbaum (starring in shows such as the debate about Linux; also participating: Linus Torvalds). Costs a whole lot of bucks but it's definitely worth it if you really want to know stuff. For eager students and desperate enthusiasts it's great.
Your questions answered
[...] what's not clear to me is the point of implementing User-level threads
at all.
Read my post. It is comprehensive, I daresay.
If the kernel is unaware of the existence of multiple threads within a
single process, then which benefits could I experience?
Read the section "Disadvantages" below.
I have read a couple of articles that stated that user-level
implementation of threads is advisable only if such threads do not
perform blocking operations (which would cause the entire process to
block).
Read the subsection "No coordination with system calls" in "Disadvantages."
All citations are from the book I recommended in the top of this answer, Chapter 2.2.4, "Implementing Threads in User Space."
Advantages
Enables threads on systems without threads
The first advantage is that user-level threads are a way to work with threads on a system without threads.
The first, and most obvious, advantage is that
a user-level threads package can be implemented on an operating system that does not support threads. All operating systems used to
fall into this category, and even now some still do.
No kernel interaction required
A further benefit is the light overhead when switching threads, as opposed to switching to the kernel mode, doing stuff, switching back, etc. The lighter thread switching is described like this in the book:
When a thread does something that may cause it to become blocked
locally, for example, waiting for another thread in its process to
complete some work, it calls a run-time system procedure. This
procedure checks to see if the thread must be put into blocked state.
If, so it stores the thread’s registers (i.e., its own) [...] and
reloads the machine registers with the new thread’s saved values. As soon as the stack
pointer and program counter have been switched, the new thread comes
to life again automatically. If the machine happens to have an
instruction to store all the registers and another one to load them
all, the entire thread switch can be done in just a handful of in-
structions. Doing thread switching like this is at least an order of
magnitude—maybe more—faster than trapping to the kernel and is a
strong argument in favor of user-level threads packages.
This efficiency is also nice because it spares us from incredibly heavy context switches and all that stuff.
Individually adjusted scheduling algorithms
Also, hence there is no central scheduling algorithm, every process can have its own scheduling algorithm and is way more flexible in its variety of choices. In addition, the "private" scheduling algorithm is way more flexible concerning the information it gets from the threads. The number of information can be adjusted manually and per-process, so it's very finely-grained. This is because, again, there is no central scheduling algorithm needing to fit the needs of every process; it has to be very general and all and must deliver adequate performance in every case. User-level threads allow an extremely specialized scheduling algorithm.
This is only restricted by the disadvantage "No automatic switching to the scheduler."
They [user-level threads] allow each process to have its own
customized scheduling algorithm. For some applications, for example,
those with a garbage-collector thread, not having to worry about a
thread being stopped at an inconvenient moment is a plus. They also
scale better, since kernel threads invariably require some table space
and stack space in the kernel, which can be a problem if there are a
very large number of threads.
Disadvantages
No coordination with system calls
The user-level scheduling algorithm has no idea if some thread has called a blocking read system call. OTOH, a kernel-level scheduling algorithm would've known because it can be notified by the system call; both belong to the kernel code base.
Suppose that a thread reads from the keyboard before any keys have
been hit. Letting the thread actually make the system call is
unacceptable, since this will stop all the threads. One of the main
goals of having threads in the first place was to allow each one to
use blocking calls, but to prevent one blocked thread from affecting
the others. With blocking system calls, it is hard to see how this
goal can be achieved readily.
He goes on that system calls could be made non-blocking but that would be very inconvenient and compatibility to existing OSes would be drastically hurt.
Mr Tanenbaum also says that the library wrappers around the system calls (as found in glibc, for example) could be modified to predict when a system cal blocks using select but he utters that this is inelegant.
Building upon that, he says that threads do block often. Often blocking requires many system calls. And many system calls are bad. And without blocking, threads become less useful:
For applications that are essentially entirely CPU bound and rarely
block, what is the point of having threads at all? No one would
seriously propose computing the first n prime numbers or playing chess
using threads because there is nothing to be gained by doing it that
way.
Page faults block per-process if unaware of threads
The OS has no notion of threads. Therefore, if a page fault occurs, the whole process will be blocked, effectively blocking all user-level threads.
Somewhat analogous to the problem of blocking system calls is the
problem of page faults. [...] If the program calls or jumps to an
instruction that is not in memory, a page fault occurs and the
operating system will go and get the missing instruction (and its
neighbors) from disk. [...] The process is blocked while the necessary
instruction is being located and read in. If a thread causes a page
fault, the kernel, unaware of even the existence of threads, naturally
blocks the entire process until the disk I/O is complete, even though
other threads might be runnable.
I think this can be generalized to all interrupts.
No automatic switching to the scheduler
Since there is no per-process clock interrupt, a thread acquires the CPU forever unless some OS-dependent mechanism (such as a context switch) occurs or it voluntarily releases the CPU.
This prevents usual scheduling algorithms from working, including the Round-Robin algorithm.
[...] if a thread starts running, no other thread in that process
will ever run unless the first thread voluntarily gives up the CPU.
Within a single process, there are no clock interrupts, making it
impossible to schedule processes round-robin fashion (taking turns).
Unless a thread enters the run-time system of its own free will, the scheduler will never get a chance.
He says that a possible solution would be
[...] to have the run-time system request a clock signal (interrupt) once a
second to give it control, but this, too, is crude and messy to
program.
I would even go on further and say that such a "request" would require some system call to happen, whose drawback is already explained in "No coordination with system calls." If no system call then the program would need free access to the timer, which is a security hole and unacceptable in modern OSes.
What's not clear to me is the point of implementing user-level threads at all.
User-level threads largely came into the mainstream due to Ada and its requirement for threads (tasks in Ada terminology). At the time, there were few multiprocessor systems and most multiprocessors were of the master/slave variety. Kernel threads simply did not exist. User threads had to be created to implement languages like Ada.
If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?
If you have kernel threads, threads multiple threads within a single process can run simultaneously. In user threads, the threads always execute interleaved.
Using threads can simplify some types of programming.
I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).
That is true on Unix and maybe not all unix implementations. User threads on many operating systems function perfectly fine with blocking I/O.
This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?
In user threads. there is never parallel execution. In kernel threads, the can be parallel execution IF there are multiple processors. On a single processor system, there is not much advantage to using kernel threads over single threads (contra: note the blocking I/O issue on Unix and user threads).
But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.
In user threads, the process manages its own "threads" by interleaving execution within itself. The process can only have a thread run in the processor that the process is running in.
If the operating system provides system services to schedule code to run on a different processor, user threads could run on multiple processors.
I conclude by saying that for practicable purposes there are no advantages to user threads over kernel threads. There are those that will assert that there are performance advantages, but for there to be such an advantage it would be system dependent.

Thread priorities in Lua

I had a look at the Lua book and learned that multi-threading in Lua is cooperative. What I couldn't find is some information about thread priorities.I guess that threads with the same priority run till completion, since multi-threading is cooperative, or a yield is done. What about a thread that has higher priority than another one?
Is it able to interrupt the one with lower priority or will it run next when the thread with lower priority has run till completion?
There are no native threads (preemptive multitasking) in Lua, there is however cooperative multitasking as you said.
The difference between preemptive and cooperative multitasking, is that in preemptive multitasking the "threads" are not necessarily allowed to run until completion, but can be preempted by other threads. This is done by the scheduler, which runs at regular intervals, switching one thread for another. This is where priorities come in. If a thread with higher priority wants to run, it can preempt an already running thread with lower priority, and the scheduler will chose that thread (depending on the scheduling strategy), next time the scheduler runs.
In cooperative multitasking there does not have to be a scheduler (though for practical reasons its usually a good idea to have one). There are however co-processes. A co-process is like a thread, except it can not be preempted. It can either run to completion, or yield to another co-process and allow that to run.
So back to your question, if you want priorities with cooperative multitasking, you need to write a scheduler, which decides which co-process to run, given its priority, and you need to write your co-process, so they give up processing once in a while, and turn back control to the scheduler.
Edit
To clarify, there is a slight difference between non-preemptive multitasking and cooperative multitasking. Non-preemptive multitasking is a bit broader, as it allows both static scheduling and cooperative multitasking.
Static scheduling means that a scheduler can schedule periodic tasks, which can then run when a task yields, maybe with a higher priority.
Cooperative multitasking is also a type of non-preemptive multitasking. However, here tasks are only scheduled by the tasks themselves, and control is explicitly yielded from on task to another, but which task it yields to, can be based on a priority.
In Lua threads cannot run in paralel (ie on multiple cores) within one Lua state. There's no concurrency, since it's cooperative multitasking. Only when one thread suspends execution (yields), can another thread resume. At no point can two Lua threads execute concurrently within one Lua state.
What you're talking about is preemption - a scheduler interrupting one thread to let another one execute.

How do user level threads (ULTs) and kernel level threads (KLTs) differ with regards to concurrent execution?

Here's what I understand; please correct/add to it:
In pure ULTs, the multithreaded process itself does the thread scheduling. So, the kernel essentially does not notice the difference and considers it a single-thread process. If one thread makes a blocking system call, the entire process is blocked. Even on a multicore processor, only one thread of the process would running at a time, unless the process is blocked. I'm not sure how ULTs are much help though.
In pure KLTs, even if a thread is blocked, the kernel schedules another (ready) thread of the same process. (In case of pure KLTs, I'm assuming the kernel creates all the threads of the process.)
Also, using a combination of ULTs and KLTs, how are ULTs mapped into KLTs?
Your analysis is correct. The OS kernel has no knowledge of user-level threads. From its perspective, a process is an opaque black box that occasionally makes system calls. Consequently, if that program has 100,000 user-level threads but only one kernel thread, then the process can only one run user-level thread at a time because there is only one kernel-level thread associated with it. On the other hand, if a process has multiple kernel-level threads, then it can execute multiple commands in parallel if there is a multicore machine.
A common compromise between these is to have a program request some fixed number of kernel-level threads, then have its own thread scheduler divvy up the user-level threads onto these kernel-level threads as appropriate. That way, multiple ULTs can execute in parallel, and the program can have fine-grained control over how threads execute.
As for how this mapping works - there are a bunch of different schemes. You could imagine that the user program uses any one of multiple different scheduling systems. In fact, if you do this substitution:
Kernel thread <---> Processor core
User thread <---> Kernel thread
Then any scheme the OS could use to map kernel threads onto cores could also be used to map user-level threads onto kernel-level threads.
Hope this helps!
Before anything else, templatetypedef's answer is beautiful; I simply wanted to extend his response a little.
There is one area which I felt the need for expanding a little: combinations of ULT's and KLT's. To understand the importance (what Wikipedia labels hybrid threading), consider the following examples:
Consider a multi-threaded program (multiple KLT's) where there are more KLT's than available logical cores. In order to efficiently use every core, as you mentioned, you want the scheduler to switch out KLT's that are blocking with ones that in a ready state and not blocking. This ensures the core is reducing its amount of idle time. Unfortunately, switching KLT's is expensive for the scheduler and it consumes a relatively large amount of CPU time.
This is one area where hybrid threading can be helpful. Consider a multi-threaded program with multiple KLT's and ULT's. Just as templatetypedef noted, only one ULT can be running at one time for each KLT. If a ULT is blocking, we still want to switch it out for one which is not blocking. Fortunately, ULT's are much more lightweight than KLT's, in the sense that there less resources assigned to a ULT and they require no interaction with the kernel scheduler. Essentially, it is almost always quicker to switch out ULT's than it is to switch out KLT's. As a result, we are able to significantly reduce a cores idle time relative to the first example.
Now, of course, all of this depends on the threading library being used for implementing ULT's. There are two ways (which I can come up with) for "mapping" ULT's to KLT's.
A collection of ULT's for all KLT's
This situation is ideal on a shared memory system. There is essentially a "pool" of ULT's to which each KLT has access. Ideally, the threading library scheduler would assign ULT's to each KLT upon request as opposed to the KLT's accessing the pool individually. The later could cause race conditions or deadlocks if not implemented with locks or something similar.
A collection of ULT's for each KLT (Qthreads)
This situation is ideal on a distributed memory system. Each KLT would have a collection of ULT's to run. The draw back is that the user (or the threading library) would have to divide the ULT's between the KLT's. This could result in load imbalance since it is not guaranteed that all ULT's will have the same amount of work to complete and complete roughly the same amount of time. The solution to this is allowing for ULT migration; that is, migrating ULT's between KLT's.

What is the difference between a thread and a fiber?

What is the difference between a thread and a fiber? I've heard of fibers from ruby and I've read heard they're available in other languages, could somebody explain to me in simple terms what is the difference between a thread and a fiber.
In the most simple terms, threads are generally considered to be preemptive (although this may not always be true, depending on the operating system) while fibers are considered to be light-weight, cooperative threads. Both are separate execution paths for your application.
With threads: the current execution path may be interrupted or preempted at any time (note: this statement is a generalization and may not always hold true depending on OS/threading package/etc.). This means that for threads, data integrity is a big issue because one thread may be stopped in the middle of updating a chunk of data, leaving the integrity of the data in a bad or incomplete state. This also means that the operating system can take advantage of multiple CPUs and CPU cores by running more than one thread at the same time and leaving it up to the developer to guard data access.
With fibers: the current execution path is only interrupted when the fiber yields execution (same note as above). This means that fibers always start and stop in well-defined places, so data integrity is much less of an issue. Also, because fibers are often managed in the user space, expensive context switches and CPU state changes need not be made, making changing from one fiber to the next extremely efficient. On the other hand, since no two fibers can run at exactly the same time, just using fibers alone will not take advantage of multiple CPUs or multiple CPU cores.
Threads use pre-emptive scheduling, whereas fibers use cooperative scheduling.
With a thread, the control flow could get interrupted at any time, and another thread can take over. With multiple processors, you can have multiple threads all running at the same time (simultaneous multithreading, or SMT). As a result, you have to be very careful about concurrent data access, and protect your data with mutexes, semaphores, condition variables, and so on. It is often very tricky to get right.
With a fiber, control only switches when you tell it to, typically with a function call named something like yield(). This makes concurrent data access easier, since you don't have to worry about atomicity of data structures or mutexes. As long as you don't yield, there's no danger of being preempted and having another fiber trying to read or modify the data you're working with. As a result, though, if your fiber gets into an infinite loop, no other fiber can run, since you're not yielding.
You can also mix threads and fibers, which gives rise to the problems faced by both. Not recommended, but it can sometimes be the right thing to do if done carefully.
In Win32, a fiber is a sort of user-managed thread. A fiber has its own stack and its own instruction pointer etc., but fibers are not scheduled by the OS: you have to call SwitchToFiber explicitly. Threads, by contrast, are pre-emptively scheduled by the operation system. So roughly speaking a fiber is a thread that is managed at the application/runtime level rather than being a true OS thread.
The consequences are that fibers are cheaper and that the application has more control over scheduling. This can be important if the app creates a lot of concurrent tasks, and/or wants to closely optimise when they run. For example, a database server might choose to use fibers rather than threads.
(There may be other usages for the same term; as noted, this is the Win32 definition.)
First I would recommend reading this explanation of the difference between processes and threads as background material.
Once you've read that it's pretty straight forward. Threads cans be implemented either in the kernel, in user space, or the two can be mixed. Fibers are basically threads implemented in user space.
What is typically called a thread is a thread of execution implemented in the kernel: what's known as a kernel thread. The scheduling of a kernel thread is handled exclusively by the kernel, although a kernel thread can voluntarily release the CPU by sleeping if it wants. A kernel thread has the advantage that it can use blocking I/O and let the kernel worry about scheduling. It's main disadvantage is that thread switching is relatively slow since it requires trapping into the kernel.
Fibers are user space threads whose scheduling is handled in user space by one or more kernel threads under a single process. This makes fiber switching very fast. If you group all the fibers accessing a particular set of shared data under the context of a single kernel thread and have their scheduling handled by a single kernel thread, then you can eliminate synchronization issues since the fibers will effectively run in serial and you have complete control over their scheduling. Grouping related fibers under a single kernel thread is important, since the kernel thread they are running in can be pre-empted by the kernel. This point is not made clear in many of the other answers. Also, if you use blocking I/O in a fiber, the entire kernel thread it is a part of blocks including all the fibers that are part of that kernel thread.
In section 11.4 "Processes and Threads in Windows Vista" in Modern Operating Systems, Tanenbaum comments:
Although fibers are cooperatively scheduled, if there are multiple
threads scheduling the fibers, a lot of careful synchronization is
required to make sure fi­bers do not interfere with each other. To
simplify the interaction between threads and fibers, it is often
useful to create only as many threads as there are processors to run
them, and affinitize the threads to each run only on a distinct set of
avail­able processors, or even just one processor. Each thread can
then run a particular subset of the fibers, establishing a one­
to-many relationship between threads and fibers which simplifies
synchronization. Even so there are still many difficulties with
fibers. Most Win32 libraries are completely unaware of fibers, and
applications that attempt to use fibers as if they were threads will
encounter various failures. The kernel has no knowledge of fi­bers,
and when a fiber enters the kernel, the thread it is executing on may
block and the kernel will schedule an arbitrary thread on the
processor, making it unavailable to run other fibers. For these
reasons fibers are rarely used except when porting code from other
systems that explicitly need the functionality pro­vided by fibers.
Note that in addition to Threads and Fibers, Windows 7 introduces User-Mode Scheduling:
User-mode scheduling (UMS) is a
light-weight mechanism that
applications can use to schedule their
own threads. An application can switch
between UMS threads in user mode
without involving the system scheduler
and regain control of the processor if
a UMS thread blocks in the kernel. UMS
threads differ from fibers in that
each UMS thread has its own thread
context instead of sharing the thread
context of a single thread. The
ability to switch between threads in
user mode makes UMS more efficient
than thread pools for managing large
numbers of short-duration work items
that require few system calls.
More information about threads, fibers and UMS is available by watching Dave Probert: Inside Windows 7 - User Mode Scheduler (UMS).
Threads were originally created as lightweight processes. In a similar fashion, fibers are a lightweight thread, relying (simplistically) on the fibers themselves to schedule each other, by yielding control.
I guess the next step will be strands where you have to send them a signal every time you want them to execute an instruction (not unlike my 5yo son :-). In the old days (and even now on some embedded platforms), all threads were fibers, there was no pre-emption and you had to write your threads to behave nicely.
Threads are scheduled by the OS (pre-emptive). A thread may be stopped or resumed at any time by the OS, but fibers more or less manage themselves (co-operative) and yield to each other. That is, the programmer controls when fibers do their processing and when that processing switches to another fiber.
Threads generally rely on the kernel to interrupt the thread so it or another thread can run (which is better known as Pre-emptive multitasking) whereas fibers use co-operative multitasking where it is the fiber itself that give up the its running time so that other fibres can run.
Some useful links explaining it better than I probably did are:
http://en.wikipedia.org/wiki/Fiber_(computer_science)
http://en.wikipedia.org/wiki/Computer_multitasking#Cooperative_multitasking.2Ftime-sharing
http://en.wikipedia.org/wiki/Pre-emptive_multitasking
Win32 fiber definition is in fact "Green Thread" definition established at Sun Microsystems. There is no need to waste the term fiber on the thread of some kind, i.e., a thread executing in user space under user code/thread-library control.
To clarify the argument look at the following comments:
With hyper-threading, a multi-core CPU can accept multiple threads and distribute them one on each core.
Superscalar pipelined CPU accepts one thread for execution and uses Instruction Level Parallelism (ILP) to run the thread faster. We may assume that one thread is broken into parallel fibers running in parallel pipelines.
SMT CPU can accept multiple threads and break them into instruction fibers for parallel execution on multiple pipelines, using pipelines more efficiently.
We should assume that processes are made of threads and that threads should be made of fibers. With that logic in mind, using fibers for other sorts of threads is wrong.

Resources