What is the difference between CPU core's thread and any software's application thread? - multithreading

I have a web application which supports multi threading in which we can run async tasks simultaneously on different thread. I understood what that thread mean.
Now suppose the server on which the application is running has multiple cores CPU with hyper threading enabled.
Now, how my application is supposed to take advantage of these threads. Is there any relation between these two which I am missing.
What i understand from CPU's threads is that
A thread is a single line of commands that are getting processed, each application has at least one thread, most have multiples. A core is the physical hardware that works on the thread. In general a processor can only work on one thread per core, CPUs with hyper threading can work on up to two threads per core.
For processors with hyper threading, there are extra registers and execution units in the core so it can store the state of two threads and work on them both, normally to change threads you have to empty the registers into the cache, write that back to the main memory, then load up the cache with the new values and load up the registers, context switches hurt performance significantly.
But when you have too much backgrounds tasks running, how they are utilizing just limited number of core's threads (i.e. 2 to 8).
PS: I have already checked What is the difference between a process and a thread? and not looking for definition of process. So its not a duplicate.

If you are making use of multiple cores in your program, then the os will schedule which cores run which threads and will take many factors into account, including other processes running, what exactly your code is trying to do, and much more. In regards to async tasks, these may not necessarily be running on a different thread or core, they may be tasks that are not instantaneous, so some scheduler may decide to start doing other things until there is a signal that the async task is complete. It will vary widely depending on the language you are programming the web application in, and the implementation.

Related

How does multithreading utilizes multiple cores?

So recently I've learned some basic knowledge about multithreading. What I've understood is that thread is a lightweight process that runs under processes by sharing memory, while one process is running under one CPU core.
Yet by this perspective I couldn't understand some saying that threads utilizes multiple cores and make the whole program executes more effective. From what I've known, threads created by one process should run only under that specific process, which means that it should only run under that very one CPU core. If we want to utilize multiple cores, we should actually use multiprocess to run parallelly. Most of what I've researched is only about the conclusion, i.e multithreading utilizes multiple cores, but none of them explains my question. Did I think anything wrong? Thanks!
Your confusion lies here:
[...] while one process is running under one CPU core.
[...] threads created by one process should run only under that specific process, which means that it should only run under that very one CPU core.
This is not true. I think what the various explanations you have read meant that any process have at least one thread (where a 'thread' is a sequence of instructions ran by a CPU core).
If you have a multithreaded program, the process will have several threads (sequences of instructions ran by a CPU core) that can run concurrently on different CPU cores.
There are many processes executing on your computer at any given time. The Operating System (OS) is the program that allocates the hardware resources (CPU cores) to all these processes and decides which process can use which cores for what amount of time before another process gets to use the CPU. Whether or not a process gets to use multiple cores is not entirely up to the process. More confusing still, multithreaded programs can use more threads than there are cores on the computer's CPU. In that case you can be certain that all your threads do not run in parallel.
One more thing:
[...] threads utilizes multiple cores and make the whole program executes more effective
I am going to sound very pedantic, but it is more complicated than that. It depends on what you mean by "effective". Are we talking about total computation time, energy consumption ..?
A sequential (1 thread) program may be very effective in terms of power consumption but taking a very long time to compute. If you are able to use multiple threads, you may be able to reduce that computation time but it will probably incur new costs (synchronization between threads, additional protection mechanisms against concurrent accesses ...).
Also, multithreading cannot help for certain tasks that fall outside of the CPU realm. For example, unless you have some very specific hardware support, reading a file from the hard-drive with 2 or more concurrent threads cannot be parallelized efficiently.

Process vs thread with example

I read articles on processes vs threads, but I am still not clear on the difference.
Suppose a process is using the CPU/Processor, doing some big calculation that takes 10 minutes. How will another process run at the same time in parallel? In a single core vs a dual core processor?
Same thing for threads, how will another thread run in parallel when the CPU/Processor is engaged with another thread?
How is context switching different for threads and for processes? I mean both process and threads use the same RAM memory, so what's the difference?
From my vague memory of Operating Systems I can offer you a little bit of help. First you have to know the difference between concurrent and simultaneous. They are not the same thing; simultaneous means both things occur at the same time and concurrent means they appear to be running simultaneously but in reality they're switching so fast you can't tell.
Processes and threads can be considered similar, but a big difference is that a process is much larger than a thread. For that reason, it is not good to have switching between processes. There is too much information in a process that would have to be saved and reloaded each time the CPU decides to switch processes.
A thread on the other hand is smaller and so it is better for switching. A process may have multiple threads that run concurrently, meaning not at the same exact time, but run together and switch between them. The context switching here is better because a thread won't have as much information to store/reload.
If you only have a single core then you can only do concurrent execution, for the most part. Once you have multiple cores you can have threads run on both cores and thus have simultaneous execution. It is up to the Operating System to schedule when threads run, when processes get to run, when to switch, how to switch them, etc. The Operating System gives you the illusion that work is being done simultaneously when this is not always the case.
If you have more confusion feel free to comment.
A process is a thing very related to the Operating System (OS). The thread is in the simplest terms, is an executing program. One or more threads run in the context of the process. The Java Virtual Machine (JVM) is a process in your OS.
And inside the JVM you can have multiple threads running concurrently.
The processor is a resource of your machine, like the memory. Your OS let your process to share the available resources, in our simple case processors and memory.
When you develop in Java, all processor in your machine are available resources.
When you develop your solution, you can have even multiple Java processes (i.e. multiple JVM) running a single or multiple thread each. But this mostly depends by your problem.
The real difference between a process and a thread is that both have an executing program, but threads share the same memory. This let your threads to theoretically work on the same data, but you have pay the complexity of concurrency and synchronisation.
Each CPU only runs one thread in a process at a time. However the OS can stop and save a thread and load and run another quickly (as little as 0.0001 seconds) This gives the illusion that many threads are running at once, even though only one is running.

Node.js single thread VS Tranditonal webserver thread pool

I am a newbie to node.js. I am currently reading the book called 'Beignning Node.js' by Basarat Ali Syed.
Here is an excerpt from it which states the disadvantage of thread pool of traditional web servers:
Most web servers used thread pool this
method a few years back and many continue to use today. However,
this method is not without drawbacks. Again there is wasting of RAM
between threads. Also the OS needs to context switch between threads
(even when they are idle), and this results in wasted CPU resources.
I don't quite understand why there is context switch between threads inside thread pool. As far as I could understand, one thread will last during the duration of a task. And once the task is completed, the thread will be free to receive the next task.
So My Q1: Why does it need context switch? When will the context switch between threads happen?
My Q2: Why does not node.js use multiple threads to handle events in the event queue? Isn't it more efficient and reduce the queuing time of events?
Context switch is when the OS need to run more threads than there are CPU cores. Say for example you have 10 threads. And they are all busy (meaning none of them have finished completing their tasks). But your CPU is only a dual core CPU (assume no hyperthreading for simplicity). So, how can all 10 threads run? It's not possible!!
The answer is context switch. The OS, when presented with lots of processes and threads to execute, will allocate a certain amount of time for each thread to run. After this time the OS will switch to another thread so that all threads will get some time to use the CPU.
The term "context switch" refers to the fact that when the OS needs to give the CPU to another thread/process it needs to copy all the values in registers temporarily to that thread's memory otherwise the other process/thread will mess up the calculation of the switched thread when it resumes. The OS will also need to re-point the virtual memory tables so that two processes will not mess up each other's memory. How expensive this operation is depends on the CPU architecture. Some architectures like the Sparc are optimized for context switching. Hyperthreading is a feature that implements context switching in hardware so it's faster (but then again, you only get one extra context per CPU with Hyperthreading as implemented on Intel/AMD64 architecture).
Not using multiple threads completely avoids context switching. Especially if your program is the only program running. So on a single core CPU, a nonblocking, single-threaded program can often beat a multithreaded program.
However, it's rare to find a single core CPU these days. The ideal number of threads you'd want to run is equal to the number of cores you have. Doing so would also avoid context switching. But even so, getting a complex multithreaded program to run fast is not easy. It's easier to get a nonblocking singlethreaded program to run fast. And in most web applications a multithreaded program wouldn't have any advantage over a nonblocking singlethreaded program because they're both I/O bound.
A nonblocking singlethreaded program is basically implementing thread-like behavior in userspace using events. This is sometimes called "green threads" in languages that support syntax that make event-oriented programming look like multithreaded programming.

multithreading and multitasking on single core processor vs multicore processor

Definitions:
Process(task): ist a program in execution. e.g: Notepad
Thread: A thread is a single sequence of instructions. A process consists of one or more threads(but only one can execute at a time).
According to the lecture a single core processor can run a single process(task) at a time.Only one thread can execute at a time but the Operating system achieves Multithreading using time slicing(thread context switch). This Thread switching happens frequently enough that the user perceives the threads as running at the same time (but they aren't running parallel!)and it occurs inside the one process.A Process context switch is similar to thread context switch with a difference that it takes place between processes (example between mediaplayer und notepad) instead between threads.
I'm not sure if this example is valid : taking two processes e.g: Notepad and Mediaplayer on a single core processor. One can play music and write in a Notepad at the same time although the two processes aren't runnin parallely(Process context switching or multitasking).Inside the one process e.g :Mediaplayer one can listen to music and create playlists at the same time although the two threads aren't running parallely (Thread context switch or multithreading)
1st Question : Are my Information above right ?
2nd Question : would an Execution of Threads in a multicore Process look the same inside a one core but with a difference that the threads of different processes can run parallely?.Is multithreading here the process of running multiple threads simultaneously on difference processes or the process of swiching between threads on a one core ? The same Question would be also for Multitasking.
How would the Process context switch and thread context switch in this case take place ?
3rd Question: The Professor used the Term single threaded processor. Is this Term an another name for sigle core Processor ?
or
several threads belonging to the same process can be executed on several CPU cores simultaneously.Time slicing still happens on multicore systems. Say one have Process with 20 Threads running on a quadcore - the OS still has to schedule 21 Threads to run on only 4 cores.
A single-threaded process runs on only one single core at a time. But that doesn't mean it'll run on the same core until it exits. The OS might give him a time slice to run on Core 1 now, pause it, and give it another time slice on Core 2 later
note : I read a lot of books and i googled enough before i decided to ask here.
EDITED
Yes, you seem to have a good understanding of this topic (not sure if it is really interesting, though). However, you seem to overthink it. I suggest a simpler way of understanding the way it works on the modern systems (it is really wild west when you start look back, with the idea of light-weighted processess and such, but I will not talk about it).
The process is a shell. It's only purpose in life is to provide environment for threads. Only the threads are really executed, process itself is never executed. A single process can host multiple threads within it, and when it hosts only one thread, one can say process is executed - but it is simply a manner of saying. A CPU can only execute a thread, not a process.
Your professor, as they often do, makes misleading statements. There is no such thing as single-threaded processor. There are single and multicored processors, and those processors can be joined together to provide multi-processor environment. From the application developer perspective, a single CPU with 4 cores does not differ from 4 single-core CPUs. There are differences, of course - but usually not for the application developer.
Multitasking is a laymen term. It can mean whatever one wants it to mean, and better be avoided in non-specific contexts.
I hope I did clarify your confusiuon.
The answer to your questions is the following:
Q1: On a single core processor two tasks can't run parallelly in the form of executing two (processor) instructions at the same time, the only possible way of multithreading is time-slicing realized by the task-scheduler (of the OS), so in that case you are approximately right. I would complete your view on the subject with the fact, that nowadays almost none of the applications are single-threaded. I don't know if notepad uses multiple threads, but I'm pretty sure, media player is multithreaded, and the task scheduler schedules time slices between threads not processes. (Fun fact: a single-threaded .NET application already runs 4-5 threads.)
Q2: Task scheduler on any system tries to spread the load between available cores, so time slices will work most likely how you displayed above, but if a process executes an additional thread, it will be executed on the core with the least load over it. Multiple cores also mean, multiple (processor) instructions can and will be executed at the same time.
Q3: In practice multithreaded processor and multicore processor means something very similar, but not the same. You see for example Intel Core i3/i5/i7 CPUs are equipped with an internal pseudo-task-scheduler, which doubles the number of virtual cores by scheduling the execution of two threads on the same core, so for example my i5 system is 2 cored but 4 threaded.
your most of concepts seem valid with non standard terms.
here is explanation of what are threads and process and then multithreading
process is running instance of program is true
when there were no thread then resources were only distributed among processes.
Now processes have threads so resources are distributed to threads but isolation is same of process means two processes still need IPC to communicate with each other. You can say multithreading as lightweight processes which can be scheduled by operating system. multit-hreading is an extension of multi-tasking so if there is one core and two processes: one with two threads and one with 4 threads, the contention of accessing core is between 6 threads not 2 processes.
for thread switch and process switch see thread context switch vs process context switch

Threads & Processes Vs MultiThreading & Multi-Core/MultiProcessor : How they are mapped?

I was very confused but the following thread cleared my doubts:
Multiprocessing, Multithreading,HyperThreading, Multi-core
But it addresses the queries from the hardware point of view. I want to know how these hardware features are mapped to software?
One thing that is obvious is that there is no difference between MultiProcessor(=Mutlicpu) and MultiCore other than that in multicore all cpus reside on one chip(die) where as in Multiprocessor all cpus are on their own chips & connected together.
So, mutlicore/multiprocessor systems are capable of executing multiple processes (firefox,mediaplayer,googletalk) at the "sametime" (unlike context switching these processes on a single processor system) Right?
If it correct. I'm clear so far. But the confusion arises when multithreading comes into picture.
MultiThreading "is for" parallel processing. right?
What are elements that are involved in multithreading inside cpu? diagram? For me to exploit the power of parallel processing of two independent tasks, what should be the requriements of CPU?
When people say context switching of threads. I don't really get it. because if its context switching of threads then its not parallel processing. the threads must be executed "scrictly simultaneously". right?
My notion of multithreading is that:
Considering a system with single cpu. when process is context switched to firefox. (suppose) each tab of firefox is a thread and all the threads are executing strictly at the same time. Not like one thread has executed for sometime then again another thread has taken until the context switch time is arrived.
What happens if I run a multithreaded software on a processor which can't handle threads? I mean how does the cpu handle such software?
If everything is good so far, now question is HOW MANY THREADS? It must be limited by hardware, I guess? If hardware can support only 2 threads and I start 10 threads in my process. How would cpu handle it? Pros/Cons? From software engineering point of view, while developing a software that will be used by the users in wide variety of systems, Then how would I decide should I go for multithreading? if so, how many threads?
First, try to understand the concept of 'process' and 'thread'. A thread is a basic unit for execution: a thread is scheduled by operating system and executed by CPU. A process is a sort of container that holds multiple threads.
Yes, either multi-processing or multi-threading is for parallel processing. More precisely, to exploit thread-level parallelism.
Okay, multi-threading could mean hardware multi-threading (one example is HyperThreading). But, I assume that you just say multithreading in software. In this sense, CPU should support context switching.
Context switching is needed to implement multi-tasking even in a physically single core by time division.
Say there are two physical cores and four very busy threads. In this case, two threads are just waiting until they will get the chance to use CPU. Read some articles related to preemptive OS scheduling.
The number of thread that can physically run in concurrent is just identical to # of logical processors. You are asking a general thread scheduling problem in OS literature such as round-robin..
I strongly suggest you to study basics of operating system first. Then move on multithreading issues. It seems like you're still unclear for the key concepts such as context switching and scheduling. It will take a couple of month, but if you really want to be an expert in computer software, then you should know such very basic concepts. Please take whatever OS books and lecture slides.
Threads running on the same core are not technically parallel. They only appear to be executed in parallel, as the CPU switches between them very fast (for us, humans). This switch is what is called context switch.
Now, threads executing on different cores are executed in parallel.
Most modern CPUs have a number of cores, however, most modern OSes (windows, linux and friends) usually execute much larger number of threads, which still causes context switches.
Even if no user program is executed, still OS itself performs context switches for maintanance work.
This should answer 1-3.
About 4: basically, every processor can work with threads. it is much more a characteristic of operating system. Thread is basically: memory (optional), stack and registers, once those are replaced you are in another thread.
5: the number of threads is pretty high and is limited by OS. Usually it is higher than regular programmer can successfully handle :)
The number of threads is dictated by your program:
is it IO bound?
can the task be divided into a number of smaller tasks?
how small is the task? the task can be too small to make it worth to spawn threads at all.
synchronization: if extensive synhronization is required, the penalty might be too heavy and you should reduce the number of threads.
Multiple threads are separate 'chains' of commands within one process. From CPU point of view threads are more or less like processes. Each thread has its own set of registers and its own stack.
The reason why you can have more threads than CPUs is that most threads don't need CPU all the time. Thread can be waiting for user input, downloading something from the web or writing to disk. While it is doing that, it does not need CPU, so CPU is free to execute other threads.
In your example, each tab of Firefox probably can even have several threads. Or they can share some threads. You need one for downloading, one for rendering, one for message loop (user input), and perhaps one to run Javascript. You cannot easily combine them because while you download you still need to react to user's input. However, download thread is sleeping most of the time, and even when it's downloading it needs CPU only occasionally, and message loop thread only wakes up when you press a button.
If you go to task manager you'll see that despite all these threads your CPU use is still quite low.
Of course if all your threads do some number-crunching tasks, then you shouldn't create too many of them as you get no performance benefit (though there may be architectural benefits!).
However, if they are mainly I/O bound then create as many threads as your architecture dictates. It's hard to give advice without knowing your particular task.
Broadly speaking, yeah, but "parallel" can mean different things.
It depends what tasks you want to run in parallel.
Not necessarily. Some (indeed most) threads spend a lot of time doing nothing. Might as well switch away from them to a thread that wants to do something.
The OS handles thread switching. It will delegate to different cores if it wants to. If there's only one core it'll divide time between the different threads and processes.
The number of threads is limited by software and hardware. Threads consume processor and memory in varying degrees depending on what they're doing. The thread management software may impose its own limits as well.
The key thing to remember is the separation between logical/virtual parallelism and real/hardware parallelism. With your average OS, a system call is performed to spawn a new thread. What actually happens (whether it is mapped to a different core, a different hardware thread on the same core, or queued into the pool of software threads) is up to the OS.
Parallel processing uses all the methods not just multi-threading.
Generally speaking, if you want to have real parallel processing, you need to perform it in hardware. Take the example of the Niagara, it has up to 8-cores each capable of executing 4-threads in hardware.
Context switching is needed when there are more threads than is capable of being executed in parallel in hardware. Even then, when executed in series (switching between one thread to the next), they are considered concurrent because there is no guarantee on the order of switching. So, it may go T0, T1, T2, T1, T3, T0, T2 and so on. For all intents and purposes, the threads are parallel.
Time slicing.
That would be up to the OS.
Multithreading is the execution of more than one thread at a time. It can happen both on single core processors and the multicore processor systems. For single processor systems, context switching effects it. Look!Context switching in this computational environment refers to time slicing by the operating system. Therefore do not get confused. The operating system is the one that controls the execution of other programs. It allows one program to execute in the CPU at a time. But the frequency at which the threads are switched in and out of the CPU determines the transparency of parallelism exhibited by the system.
For multicore environment,multithreading occurs when each core executes a thread.Though,in multicore again,context switching can occur in the individual cores.
I think answers so far are pretty much to the point and give you a good basic context. In essence, say you have quad core processor, but each core is capable of executing 2 simultaneous threads.
Note, that there is only slight (or no) increase of speed if you are running 2 simultaneous threads on 1 core versus you run 1st thread and then 2nd thread vertically. However, each physical core adds speed to your general workflow.
Now, say you have a process running on your OS that has multiple threads (i.e. needs to run multiple things in "parallel") and has some kind of stack of tasks in a queue (or some other system with priority rules). Then software sends tasks to a queue and your processor attempts to execute them as fast as it can. Now you have 2 cases:
If a software supports multiprocessing, then tasks will be sent to any available processor (that is not doing anything or simply finished doing some other job and job send from your software is 1st in a queue).
If your software does not support multiprocessing, then all of your jobs will be done in a similar manner, but only by one of your cores.
I suggest reading Wikipedia page on thread. Very first picture there already gives you a nice insight. :)

Resources