[zephyr-rtos][riot-os] Zephyr vs. RIOT OS - multithreading

Hello everyone,
I'm Luiz Villa a researcher on software defined power electronics at the University of Toulouse. My team is working on trying to embed an RTOS onto a micro-controller in order to create a more friendly development process of embedded control in power electronics.
We are trying as much as possible to avoid using ISRs for two reasons:
It makes it easier to collaborate in software development (our project is open-source)
Interrupts make the code execution time non-deterministic (which we wish to avoid)
We would like to make a benchmark between Zephyr and RIOT-OS in terms of thread speed. We need a code that runs at 20kHz with two to three threads doing :
ADC acquisition and data averaging
Mathematical calculations for control (using CMSIS)
Communication with the outside
Since time is such a critical element for us, we need to know:
What is the minimum time for executing a thread in Zephyr and RIOT-OS?
The time required to switch between threads in Zephyr and RIOT-OS?
Our preliminary results show that:
When testing with a single thread and a sleep time of 0us, Zephyr has a period of 9us and riot 5us
When testing with a single thread and a sleep time of 10us, Zephyr has a period of 39us and riot 15us
We use a Nucleo-G474RE with the following code: https://gitlab.laas.fr/owntech/zephyr/-/tree/test_adc_g4
We are quite surprised with our results, since we expected both OS to consume much less resources than they do.
What do you think? Have you tried running any of these OS as fast as possible? What were your results? Have you tested Zephyr's thread switching time?
Thanks for reading
Luiz

Disclaimer: I'm a RIOT core developer.
The time required to switch between threads in Zephyr and RIOT-OS?
When testing with a single thread and a sleep time of 0us, Zephyr has a period of 9us and riot 5us
This seems about right.
If I run one of RIOT's own scheduling microbenchmarks (e.g., tests/bench_mutex_pingpong), on the nucleo-f401re (a 84MHz STM32F4 / Cortex-M4),
this is the result:
main(): This is RIOT! (Version: 2021.04-devel-1250-gc8cb79c)
main starting
{ "result" : 157303, "ticks" : 534 }
The tests measures how many times a thread switches to another thread and back.
One iteration (two context switches) take ~534 clock cycles, or 1000000/154303 = ~6.36us, which is close to the number you got.
This is the context switch overhead. A thread's registers and state are stored on its stack, the scheduler runs to figure out the next runnable thread, and restores that thread's registers and state.
I'm surprised Zephyr isn't closer to RIOT. Maybe check if it was compiled with optimization enabled, or if some enabled features increase the switching overhead (e.g., is the MPU enabled?).
What is the minimum time for executing a thread in Zephyr and RIOT-OS?
Whatever is left after ISR's have been served and contexts have been switched.
What do you think?
Having three threads scheduled at 20KHz and having them do actual work is gonna be tight with an Cortex-M on either Zephyr or RIOT, so I think you should re-architecture your application.
Having multiple threads for separating them logically is very nice, but here a classical main loop might be a better choice.
Something like this (pseudocode):
void loop() {
while(1) {
handle_adc();
do_dsp_computation();
send_data();
periodic_sleep_us(50);
}
}

Related

Does multithreading actually work in uniprocessor environment

Suppose we have a process with multiple threads in a uniprocessor.
Now I know that if we have several processes, only one of them will be processed at a time in a uniprocessor and hence the processes are not concurrent.
If my understanding is correct, similarly each thread will be processed at a time and not concurrent in a uniprocessor. Is this statement true? If so then does multithreading mean having more than one thread in a process and does not mean running multiple threads at a time? And does that mean there's no benefit of creating user threads in a uniprocessor environment?
TL;DR: threads are switching more often than processes and in real time we have an effect of concurrency because it is happens really fast.
when you wrote:
each thread will be processed at a time and not concurrent in a uni processor
Notice the word "concurrent", there is no real concurrency in uni processor, there is only effect of that thanks to the multiple number of context switches between processes.
Let's clarify something here, the single core of the CPU can handle one thread at a given time, each process has a main thread and (if needed) more threads running together. If a process A is now running and it has 3 threads: A1(main thread), A2, A3 all three will be running as long as process A is being processed by the CPU core. When a context switch occur process A is no longer running and now process B will run with his threads.
About this statement:
there's no benefit of creating user threads in a uni processor environment
That is not true. there is a benefit in creating threads, they are easier to create ("spawn" as in the books) and shearing the process heap memory. Creating a sub process ("child" as in the books) is a overhead comparing to a thread because a process need to have his own memory. For example each google chrome tab is a process not a thread, but this tab has multiple threads running concurrency with little responsibility.
If you are still somehow running a computer with just one, single-core, CPU, then you would be correct to observe that only one thread can be physically executing at one time. But that does not negate the value of breaking up the application into multiple threads and/or processes.
The essential benefit is concurrency. When one thread is waiting (e.g. for an input/output operation to complete), there is something else for the CPU to be doing in the meantime: it can be running a different thread that isn't waiting. With a carefully designed application, you can get much better utilization of every part of the hardware, more parallelism, and thus, more throughput.
My favorite go-to example is a fast food restaurant. About a dozen workers, each one doing different things, cooperate to bring your order to you. Even if one of them (say, "the fry guy") is standing around, someone else always has something to do. Several orders are in-process at once. This overlap, this "concurrency," is what you are shooting for – regardless of how many CPUs you have.
Multithreading is also commonly used with GUI applications that also need to do some kind of "heavy lifting." One thread handles the GUI interaction (and has no other real responsibilities) while other threads, with a slightly inferior priority (or "niceness") do the lifting. When a GUI event comes in, the GUI thread pre-empts the others and responds to it immediately, then of course goes right back to sleep again. But in this way the GUI always remains very responsive – even though the other threads are doing "heavy lifting" things, GUI messages are still handled very promptly. (I scooped-up about a 25% performance improvement by re-tooling an older application to use this approach, because the application was no longer "polling" for GUI events.)
The first question I ask about any thread is, "what does it wait for?" To me, a thread is defined by what event it waits for and what it does when that event happens.
Threads were in wide-spread use for at least a decade before multi-processor computers became commercially available. They are useful when you want to write a program that has to respond to un-synchronized events that come from multiple different sources. There's a few different ways to model a program like that. One way is to have a different thread to wait on each different event source. The next most popular is an event driven architecture in which there's a main loop that waits for all events and calls different event handler functions for each of the different kinds of event.
The multi-threaded style of program often is easier to read* because there's usually different activities going on inside the program, and the state of each activity can be implicit in the context (i.e., registers and call stack) of the thread that's driving it, while in the event-driven model, each activity's state must be explicitly encoded in some object.
The implicit-in-the-context way of keeping the state is much closer to the procedural style of coding a single activity that we learn as beginners.
*Easier to read does not mean that the code is easy to write without making bad and non-obvious mistakes!!
The main impetus for developing threads was Ada compliance. Prior to that, different operating systems had their own ways of handing multiple things at once. In eunuchs, the way to do more than one thing was to spin off a new process. In VMS, software interrupts (aka Asynchronous System Traps or Asynchronous Procedure Calls in Windoze). In those days (1970's) multiprocessor systems were rare.
One of the goals of Ada was to have a system independent way of doing things. It adopted the "task" which is effectively a thread. In order to support Ada, compiler developers had to include task (thread) libraries.
With the rise of multiprocessors, operating systems started to make threads (rather than processes) the basic schedulable unit in a system.
Threads then give a way for programs to handle multiple things simultaneously, even if there is only one processor. Sadly, support for threads in programming languages has been woefully lacking. Ada is the only major language I can think of that has real support for threads (tasks). Thread support in Java, for example, is a complete, sick joke. The result is threads are not as effective in practice as they could be.

Does a thread - run at the same speed (in relation to all other processor tasks)?

I am wondering if a thread are running the same speed in relation to all other threads in a computer? They all run on the same processor after all so if a thread "lags" - I think that would mean that everything preformed by the processor lagged at that moment? I think this a correct assumption? Or am I wrong?
First of all, there are several layers to think about: the concept "thread" exists within your programming language, or maybe within some "framework" of your language; within the operating system and even within the hardware.
The key thing is: in the end, down in the real hardware, the CPU simply has various elements of its pipes, registers, ... duplicated. Meaning: each thread is running on its own distinct hardware; and that hardware is (most likely) identical for each thread. Therefore: all threads should be moving with the same speed (the clock speed of the CPU that is).
But of course, there are other concepts that come into play; for example thread priority. Most operating system allow you to say: "this thread is more important than another thread". And that means that the more important thread might receive more CPU cycles than others. But that also depends on the overall load the CPU is currently facing.
Finally: you have to be clear about terminology. A thread is nothing but an abstraction for some "thread of activity". A thread is meaningless until you consider that it is executing certain code. So a thread itself is not "lagging"; but the code it is executing might for example lead to a lot of waiting (for example when doing IO). And then you observe the thread "not moving". But that isn't caused by the "speed" which the thread runs with, but with what the thread is actually doing!
Regarding priorities: that very much depends on the operating system / framework. Basically there would be one component measuring "consumption" and making decisions on that. The easiest model is that threads receive time slices; and of course: then some manager can decide how to distribute the available time slices to the threads waiting for processor time. You might look here (pretty old material, but still a good read); or there for a specific implementation example.

What's the point of multi-threading on a single core?

I've been playing with the Linux kernel recently and diving back into the days of OS courses from college.
Just like back then, I'm playing around with threads and the like. All this time I had been assuming that threads were automatically running concurrently on multiple cores but I've recently discovered that you actually have to explicitly code for handling multiple cores.
So what's the point of multi-threading on a single core? The only example I can think of is from college when writing a client/server program but that seems like a weak point.
All this time I had been assuming that threads were automatically
running concurrently on multiple cores but I've recently discovered
that you actually have to explicitly code for handling multiple cores.
The above is incorrect for any widely used, modern OS. All of Linux's schedulers, for example, will automatically schedule threads on different cores and even automatically move threads from one core to another when necessary to maximize core usage. There are some APIs that allow you to modify the schedulers' behavior, but these APIs are generally used to disable automatic thread-to-core scheduling, not to enable it.
So what's the point of multi-threading on a single core?
Imagine you have a GUI program whose purpose is to execute an expensive computation (for example, render a 3D image or a Mandelbrot set) and then display the result. Let's say this computation takes 30 seconds to complete on this particular CPU. If you implement that program the obvious way, and use only a single thread, then the user's GUI controls will be unresponsive for 30 seconds while the calculation is executing -- the user will be unable to do anything with your program, and possibly unable to do anything with his computer at all. Since users expect GUI controls to be responsive at all times, that would be a poor user experience.
If you implement that program with two threads (one GUI thread and one rendering thread), on the other hand, the user will be able to click buttons, resize the window, quit the program, choose menu items, etc, even while the computation is executing, because the OS is able to wake up the GUI thread and allow it to handle mouse/keyboard events when necessary.
Of course, it is possible to write this program with a single thread and keep its GUI responsive, by writing your single thread to do just a few milliseconds worth of computation, then check to see if there are GUI events available to process, handling them, then going back to do a bit more computation, etc. But if you code your app this way, you are essentially writing your own (very primitive) thread scheduler inside your app anyway, so why reinvent the wheel?
The first versions of MacOS were designed to run on a single core, but had no real concept of multithreading. This forced every application developer to correctly implement some manual thread management -- even if their app did not have any extended computations, they had to explicitly indicate when they were done using the CPU, e.g. by calling WaitNextEvent. This lack of multithreading made early (pre-MacOS-X) versions of MacOS famously unreliable at multitasking, since just one poorly written application could bring the whole computer to a grinding halt.
First, a program not only computes, but also waits for input/output and so can be considered as executing on an I/O processor. So even single-core machine is a multi-processor machine, and employing of multi-threading is justified.
Second, a task can be divided in several threads in the sake of modularity.
Multithreading is not only for taking advantage of multiple cores.
You need multiple processes for multitasking. For similar reason you are allowed to have multiple threads, which are lightweight compared with processes.
You probably don't want to spawn processes all the time for things like blocking I/O. That may be overkill.
And there is fiber, which is even more lightweight. So we have process, thread, and fiber for different levels of needs.
Well, when you say multithreading on a single core, there are things you need to consider. For example, the thread API that you are using - is it user level or kernel level. Most probably from you question I believe you are using user level threads.
Now, user level threads, depending upon the host OS or the API itself may map to single kernel thread or multiple. Many relations are possible like 1-1,many-1 or many-many.
Now, if there is a single core, your OS can still provide you several Kernel level threads which may behave as multiple processes to the CPU. In which case, OS will give you a time-slicing (and multi-programming) on the kernel threads leading to superfast context switch and via the user level API - you/your code will seem to have multithreaded features.
Also note that eventhough your processor is a single core, depending on the make, it can be hyperthreaded and have super deep pipelines allowing the concurrent running of Kernel threads with very low overhead.
For references: Check Intel/AMD architecture and how various OS provide Kernel threads.

Why are OS threads considered expensive?

There are many solutions geared toward implementing "user-space" threads. Be it golang.org goroutines, python's green threads, C#'s async, erlang's processes etc. The idea is to allow concurrent programming even with a single or limited number of threads.
What I don't understand is, why are the OS threads so expensive? As I see it, either way you have to save the stack of the task (OS thread, or userland thread), which is a few tens of kilobytes, and you need a scheduler to move between two tasks.
The OS provides both of this functions for free. Why should OS threads be more expensive than "green" threads? What's the reason for the assumed performance degradation caused by having a dedicated OS thread for each "task"?
I want to amend Tudors answer which is a good starting point. There are two main overheads of threads:
Starting and stopping them. Involves creating a stack and kernel objects. Involves kernel transitions and global kernel locks.
Keeping their stack around.
(1) is only a problem if you are creating and stopping them all the time. This is solved commonly using thread pools. I consider this problem to be practically solved. Scheduling a task on a thread pool usually does not involve a trip to the kernel which makes it very fast. The overhead is on the order of a few interlocked memory operations and a few allocations.
(2) This becomes important only if you have many threads (> 100 or so). In this case async IO is a means to get rid of the threads. I found that if you don't have insane amounts of threads synchronous IO including blocking is slightly faster than async IO (you read that right: sync IO is faster).
Saving the stack is trivial, no matter what its size - the stack pointer needs to be saved in the Thread Info Block in the kernel, (so usualy saving most of the registers as well since they will have been pushed by whatever soft/hard interrupt caused the OS to be entered).
One issue is that a protection level ring-cycle is required to enter the kernel from user. This is an essential, but annoying, overhead. Then the driver or system call has to do whatever was requested by the interrupt and then the scheduling/dispatching of threads onto processors. If this results in the preemption of a thread from one process by a thread from another, a load of extra process context has to be swapped as well. Even more overhead is added if the OS decides that a thread that is running on another processor core than the one handling the interrupt mut be preempted - the other core must be hardware-interrupted, (this is on top of the hard/soft interrupt that entred the OS in the first place.
So, a scheduling run may be quite a complex operation.
'Green threads' or 'fibers' are, (usually), scheduled from user code. A context-change is much easier and cheaper than an OS interrupt etc. because no Wagnerian ring-cycle is required on every context-change, process-context does not change and the OS thread running the green thread group does not change.
Since something-for-nothing does not exist, there are problems with green threads. They ar run by 'real' OS threads. This means that if one 'green' thread in a group run by one OS thread makes an OS call that blocks, all green threads in the group are blocked. This means that simple calls like sleep() have to be 'emulated' by a state-machine that yields to other green threads, (yes, just like re-implementing the OS). Similarly, any inter-thread signalling.
Also, of course, green threads cannot directly respond to IO signaling, so somewhat defeating the point of having any threads in the first place.
There are many solutions geared toward implementing "user-space" threads. Be it golang.org goroutines, python's green threads, C#'s async, erlang's processes etc. The idea is to allow concurrent programming even with a single or limited number of threads.
It's an abstraction layer. It's easier for many people to grasp this concept and use it more effectively in many scenarios. It's also easier for many machines (assuming a good abstraction), since the model moves from width to pull in many cases. With pthreads (as an example), you have all the control. With other threading models, the idea is to reuse threads, for the process of creating a concurrent task to be inexpensive, and to use a completely different threading model. It's far easier to digest this model; there's less to learn and measure, and the results are generally good.
What I don't understand is, why are the OS threads so expensive? As I see it, either way you have to save the stack of the task (OS thread, or userland thread), which is a few tens of kilobytes, and you need a scheduler to move between two tasks.
Creating a thread is expensive, and the stack requires memory. As well, if your process is using many threads, then context switching can kill performance. So lightweight threading models became useful for a number of reasons. Creating an OS thread became a good solution for medium to large tasks, ideally in low numbers. That's restrictive, and quite time consuming to maintain.
A task/thread pool/userland thread does not need to worry about much of the context switching or thread creation. It's often "reuse the resource when it becomes available, if it's not ready now -- also, determine the number of active threads for this machine".
More commmonly (IMO), OS level threads are expensive because they are not used correctly by the engineers - either there are too many and there is a ton of context switching, there is competition for the same set of resources, the tasks are too small. It takes much more time to understand how to use OS threads correctly, and how to apply that best to the context of a program's execution.
The OS provides both of this functions for free.
They're available, but they are not free. They are complex, and very important to good performance. When you create an OS thread, it's given time 'soon' -- all the process' time is divided among the threads. That's not the common case with user threads. The task is often enqueued when the resource is not available. This reduces context switching, memory, and the total number of threads which must be created. When the task exits, the thread is given another.
Consider this analogy of time distribution:
Assume you are at a casino. There are a number people who want cards.
You have a fixed number of dealers. There are fewer dealers than people who want cards.
There is not always enough cards for every person at any given time.
People need all cards to complete their game/hand. They return their cards to the dealer when their game/hand is complete.
How would you ask the dealers to distribute cards?
Under the OS scheduler, that would be based on (thread) priority. Every person would be given one card at a time (CPU time), and priority would be evaluated continually.
The people represent the task or thread's work. The cards represent time and resources. The dealers represent threads and resources.
How would you deal fastest if there were 2 dealers and 3 people? and if there were 5 dealers and 500 people? How could you minimize running out of cards to deal? With threads, adding cards and adding dealers is not a solution you can deliver 'on demand'. Adding CPUs is equivalent to adding dealers. Adding threads is equivalent to dealers dealing cards to more people at a time (increases context switching). There are a number of strategies to deal cards more quickly, especially after you eliminate the people's need for cards in a certain amount of time. Would it not be faster to go to a table and deal to a person or people until their game is complete if the dealer to people ratio were 1/50? Compare this to visiting every table based on priority, and coordinating visitation among all dealers (the OS approach). That's not to imply the OS is stupid -- it implies that creating an OS thread is an engineer adding more people and more tables, potentially more than the dealers can reasonably handle. Fortunately, the constraints may be lifted in many cases by using other multithreading models and higher abstractions.
Why should OS threads be more expensive than "green" threads? What's the reason for the assumed performance degradation caused by having a dedicated OS thread for each "task"?
If you developed a performance critical low level threading library (e.g. upon pthreads), you would recognize the importance of reuse (and implement it in your library as a model available for users). From that angle, the importance of higher level multithreading models is a simple and obvious solution/optimization based on real world usage as well as the ideal that the entry bar for adopting and effectively utilizing multithreading can be lowered.
It's not that they are expensive -- the lightweight threads' model and pool is a better solution for many problems, and a more appropriate abstraction for engineers who do not understand threads well. The complexity of multithreading is greatly simplified (and often more performant in real world usage) under this model. With OS threads, you do have more control, but several more considerations must be made to use them as effectively as possible -- heeding these consideration can dramatically reflow a program's execution/implementation. With higher level abstractions, many of these complexities are minimized by completely altering the flow of task execution (width vs pull).
The problem with starting kernel threads for each small task is that it incurs a non-negligible overhead to start and stop, coupled with the stack size it needs.
This is the first important point: thread pools exist so that you can recycle threads, in order to avoid wasting time starting them as well as wasting memory for their stacks.
Secondly, if you fire off threads to do asynchronous I/O, they will spend most of their time blocked waiting for the I/O to complete, thus effectively not doing any work and wasting memory. A much better option is to have a single worker handle multiple async calls (through some under-the-hood scheduling technique, such as multiplexing), thus again saving memory and time.
One thing that makes "green" threads faster than kernel threads is that they are user-space objects, managed by a virtual machine. Starting them is a user space call, while starting a thread is a kernel-space call that is much slower.
A person in Google shows an interesting approach.
According to him, kernel mode switching itself is not the bottleneck, and the core cost happen on SMP scheduler. And he claims M:N schedule assisted by kernel wouldn't be expensive, and this makes me to expect general M:N threading to be available on every languages.
Because the OS. Imagine that instead of asking you to clean the house your grandmother has to call the social service that does some paperwork and a week after assigns a social worker for helping her. The worker can be called off at any time and replaced with another one, which again takes several days.
That's pretty ineffective and slow, huh?
In this metaphor you are a userland coroutine scheduler, the social service is an OS with its kernel-level thread scheduler, and a social worker is a fully-fledged thread.
I think the two things are in different levels.
Thread or Process is an instance of the program which is being executed. In a process/thread there is much more things in it. Execution stack, opening files, signals, processors status, and a many other things.
Greentlet is different, it is runs in vm. It supplies a light-weight thread. Many of them supply a pseudo-concurrently (typically in a single or a few OS-level threads). And often they supply a lock-free method by data-transmission instead of data sharing.
So, the two things focus different, so the weight are different.
And In my mind, the greenlet should be finished in the VM not the OS.

Threads & Processes Vs MultiThreading & Multi-Core/MultiProcessor : How they are mapped?

I was very confused but the following thread cleared my doubts:
Multiprocessing, Multithreading,HyperThreading, Multi-core
But it addresses the queries from the hardware point of view. I want to know how these hardware features are mapped to software?
One thing that is obvious is that there is no difference between MultiProcessor(=Mutlicpu) and MultiCore other than that in multicore all cpus reside on one chip(die) where as in Multiprocessor all cpus are on their own chips & connected together.
So, mutlicore/multiprocessor systems are capable of executing multiple processes (firefox,mediaplayer,googletalk) at the "sametime" (unlike context switching these processes on a single processor system) Right?
If it correct. I'm clear so far. But the confusion arises when multithreading comes into picture.
MultiThreading "is for" parallel processing. right?
What are elements that are involved in multithreading inside cpu? diagram? For me to exploit the power of parallel processing of two independent tasks, what should be the requriements of CPU?
When people say context switching of threads. I don't really get it. because if its context switching of threads then its not parallel processing. the threads must be executed "scrictly simultaneously". right?
My notion of multithreading is that:
Considering a system with single cpu. when process is context switched to firefox. (suppose) each tab of firefox is a thread and all the threads are executing strictly at the same time. Not like one thread has executed for sometime then again another thread has taken until the context switch time is arrived.
What happens if I run a multithreaded software on a processor which can't handle threads? I mean how does the cpu handle such software?
If everything is good so far, now question is HOW MANY THREADS? It must be limited by hardware, I guess? If hardware can support only 2 threads and I start 10 threads in my process. How would cpu handle it? Pros/Cons? From software engineering point of view, while developing a software that will be used by the users in wide variety of systems, Then how would I decide should I go for multithreading? if so, how many threads?
First, try to understand the concept of 'process' and 'thread'. A thread is a basic unit for execution: a thread is scheduled by operating system and executed by CPU. A process is a sort of container that holds multiple threads.
Yes, either multi-processing or multi-threading is for parallel processing. More precisely, to exploit thread-level parallelism.
Okay, multi-threading could mean hardware multi-threading (one example is HyperThreading). But, I assume that you just say multithreading in software. In this sense, CPU should support context switching.
Context switching is needed to implement multi-tasking even in a physically single core by time division.
Say there are two physical cores and four very busy threads. In this case, two threads are just waiting until they will get the chance to use CPU. Read some articles related to preemptive OS scheduling.
The number of thread that can physically run in concurrent is just identical to # of logical processors. You are asking a general thread scheduling problem in OS literature such as round-robin..
I strongly suggest you to study basics of operating system first. Then move on multithreading issues. It seems like you're still unclear for the key concepts such as context switching and scheduling. It will take a couple of month, but if you really want to be an expert in computer software, then you should know such very basic concepts. Please take whatever OS books and lecture slides.
Threads running on the same core are not technically parallel. They only appear to be executed in parallel, as the CPU switches between them very fast (for us, humans). This switch is what is called context switch.
Now, threads executing on different cores are executed in parallel.
Most modern CPUs have a number of cores, however, most modern OSes (windows, linux and friends) usually execute much larger number of threads, which still causes context switches.
Even if no user program is executed, still OS itself performs context switches for maintanance work.
This should answer 1-3.
About 4: basically, every processor can work with threads. it is much more a characteristic of operating system. Thread is basically: memory (optional), stack and registers, once those are replaced you are in another thread.
5: the number of threads is pretty high and is limited by OS. Usually it is higher than regular programmer can successfully handle :)
The number of threads is dictated by your program:
is it IO bound?
can the task be divided into a number of smaller tasks?
how small is the task? the task can be too small to make it worth to spawn threads at all.
synchronization: if extensive synhronization is required, the penalty might be too heavy and you should reduce the number of threads.
Multiple threads are separate 'chains' of commands within one process. From CPU point of view threads are more or less like processes. Each thread has its own set of registers and its own stack.
The reason why you can have more threads than CPUs is that most threads don't need CPU all the time. Thread can be waiting for user input, downloading something from the web or writing to disk. While it is doing that, it does not need CPU, so CPU is free to execute other threads.
In your example, each tab of Firefox probably can even have several threads. Or they can share some threads. You need one for downloading, one for rendering, one for message loop (user input), and perhaps one to run Javascript. You cannot easily combine them because while you download you still need to react to user's input. However, download thread is sleeping most of the time, and even when it's downloading it needs CPU only occasionally, and message loop thread only wakes up when you press a button.
If you go to task manager you'll see that despite all these threads your CPU use is still quite low.
Of course if all your threads do some number-crunching tasks, then you shouldn't create too many of them as you get no performance benefit (though there may be architectural benefits!).
However, if they are mainly I/O bound then create as many threads as your architecture dictates. It's hard to give advice without knowing your particular task.
Broadly speaking, yeah, but "parallel" can mean different things.
It depends what tasks you want to run in parallel.
Not necessarily. Some (indeed most) threads spend a lot of time doing nothing. Might as well switch away from them to a thread that wants to do something.
The OS handles thread switching. It will delegate to different cores if it wants to. If there's only one core it'll divide time between the different threads and processes.
The number of threads is limited by software and hardware. Threads consume processor and memory in varying degrees depending on what they're doing. The thread management software may impose its own limits as well.
The key thing to remember is the separation between logical/virtual parallelism and real/hardware parallelism. With your average OS, a system call is performed to spawn a new thread. What actually happens (whether it is mapped to a different core, a different hardware thread on the same core, or queued into the pool of software threads) is up to the OS.
Parallel processing uses all the methods not just multi-threading.
Generally speaking, if you want to have real parallel processing, you need to perform it in hardware. Take the example of the Niagara, it has up to 8-cores each capable of executing 4-threads in hardware.
Context switching is needed when there are more threads than is capable of being executed in parallel in hardware. Even then, when executed in series (switching between one thread to the next), they are considered concurrent because there is no guarantee on the order of switching. So, it may go T0, T1, T2, T1, T3, T0, T2 and so on. For all intents and purposes, the threads are parallel.
Time slicing.
That would be up to the OS.
Multithreading is the execution of more than one thread at a time. It can happen both on single core processors and the multicore processor systems. For single processor systems, context switching effects it. Look!Context switching in this computational environment refers to time slicing by the operating system. Therefore do not get confused. The operating system is the one that controls the execution of other programs. It allows one program to execute in the CPU at a time. But the frequency at which the threads are switched in and out of the CPU determines the transparency of parallelism exhibited by the system.
For multicore environment,multithreading occurs when each core executes a thread.Though,in multicore again,context switching can occur in the individual cores.
I think answers so far are pretty much to the point and give you a good basic context. In essence, say you have quad core processor, but each core is capable of executing 2 simultaneous threads.
Note, that there is only slight (or no) increase of speed if you are running 2 simultaneous threads on 1 core versus you run 1st thread and then 2nd thread vertically. However, each physical core adds speed to your general workflow.
Now, say you have a process running on your OS that has multiple threads (i.e. needs to run multiple things in "parallel") and has some kind of stack of tasks in a queue (or some other system with priority rules). Then software sends tasks to a queue and your processor attempts to execute them as fast as it can. Now you have 2 cases:
If a software supports multiprocessing, then tasks will be sent to any available processor (that is not doing anything or simply finished doing some other job and job send from your software is 1st in a queue).
If your software does not support multiprocessing, then all of your jobs will be done in a similar manner, but only by one of your cores.
I suggest reading Wikipedia page on thread. Very first picture there already gives you a nice insight. :)

Resources