Multiple threads on dual core cpu - multithreading

I'm studying about threads and slightly confused about 1 thing.
If I have a single process with multiple threads running on a dual/quad core CPU, will different threads run concurrently on different cores?
Thanks in advance.

It Depends.
At least on Linux, each task gets assigned to a set of CPUs that it can execute on (processor affinity). And, at least on Linux, the scheduler will try to schedule the task on the same processor as last time, so that it gets the best benefit of CPU cache re-use. The hilarious thing is that it doesn't always rebalance when the system is under load, so it is possible to run one core quite hot and contested and leave three cores cool and relatively idle. (I've seen this exact behavior with the Folding # Home client.)
You can force the affinity you need with the pthread_setaffinity_np(3) routine for threaded applications or sched_setaffinity(2) for more traditional Unix-style fork(2)ed applications. Or you can use the taskset(1) program to set the affinity before or after starting an application. (Which is the approach I took with my silly Folding # Home client -- it was easy to modify the initscript to call taskset(1) to set the affinity of each client process correctly, so each client got its own core and didn't compete for resources with the other clients on different sibling HyperThreaded 'faked' execution cores.)

Yes
It depends on the language, the library, and the operating system, and whether the threaded application ever actually has multiple runnable threads at the same point in time, but usually the answer is "yes".

You can never be sure of that fact, but if it is processor-intensive (such as a game) then most likely yes.

In that case you need to synchronized your every core of processor with memory by using volatile keyword which ensure that every core of processor getting new updated value from memory.

Somnetimes the threads will run concurrently, sometimes not. Its all up to the package you use and the operating system and how CPU intensive each thread each.

I think that you are loosing the idea behind concurrency; it's not that you are looking to run processes on multiple cores. Instead, you're needing to not block on one process the entire time. A perfect example of this is with threading network listeners. You want to perform an accept which will actually create a new client->server socket. After this you want to do some processing with that socket while still be able to take new connections. This is where you would want to generate a thread to perform the processing so that the accept can get back on track to waiting for a new connection.

Related

Why do we need semaphores on single cpu?

I have read that we use semaphores inside the linux kerenl,and i have read that semaphores has advantages even in one single cpu (we can run only one process\thread). Can anyone please give me an example of a problem that semaphore solves(inside the kernel)?
In my view, there can be a problem only if we have more than one cpu, because two process may call system calls that use the same data structure, and probablly cause problems.
Thank you for your help!
You don't really need more than one CPU for concurrency. The multiple CPUs are really "an implementation detail," a piece of hardware quirkiness that you can abstract away from. Concurrency is a logical property of programs. You can have concurrency without multiple CPUs, and use multiple CPUs without "real concurrency".
Consider a web server. It has to be "concurrent," in the sense that it must serve multiple clients at once, hold information about multiple connections and once, and process multiple requests at once. You can have it literally do this, by having multiple CPUs all working at the same time. Yet, the program only has to appear to do multiple things at once. It could just as well be running on one CPU and context switching to fairly service all the work put to it. The fact that a web-server does multiple things at once is part of its interface: the I/O for the connections are interleaved, if a request has exclusively locked a resource, another request won't start trying to manipulate that same resource, etc. Writing a web server without concurrency produces a program that is wrong.
Semaphores help you with concurrency, by letting you control the way processes access resources. You asked, if you had one process running, how another could run at the same time with only a single core. Well, as I said, concurrency doesn't need multiple cores. The first process can be paused, and the second one started while the first one is still unfinished. This is just an implementation detail; logically, to the program writer, the two processes are running simultaneously, whether there are multiple cores or not. If the program was written without semaphores (or had broken concurrency in some other way), it would be wrong, even on a single core. Physically, this will be because context switching can abruptly pause one computation and start another at any time, and, without semaphores, the newly live thread won't know what resources it can and cannot access. Logically, this will be because the processes are running simultaneously, once you abstract yourself away from the implementation, and, in general, processes running simultaneously can walk over each other if not properly synchronized.
For an example applicable to an OS kernel, consider that every process is logically running concurrently with every other process. A kernel provides the implementation that makes this concurrency work. A resource that two processes may want simultaneously is a hard drive. A semaphore might be used in the kernel to track whether a given drive is currently busy with a read or write. A process trying to read or write to the same disk will ask the kernel to do so, and the kernel can check the semaphore to see that the disk is still busy and force the offending process to wait. Now, an operating system does count as low level code, so in some places, yes, you might want to omit some otherwise vital concurrency safeguards when running on a single CPU, because your job is to handle such implementation details, but higher level parts may still use them.
In contrast, consider a number-crunching program. Let's say it's processing each element of a huge array of data into an equal-sized array of modified data (a functional map operation). It can use multiple CPUs to do this more quickly, but it can also work one CPU. The observable behavior of the program is the same, and you never get any idea that it's doing multiple things at once from its behavior. Numbers go in, numbers come out, who cares what happens in the middle? Writing such a program without the ability to do multiple things at once does not produce a logically incorrect program, just a slow one. Such a program probably does not need semaphores when running on a single CPU, because it didn't need concurrency in the first place.

Controlling the process allocation to a processor

Does fork always create a process in a separate processor?
Is there a way, I could control the forking to a particular processor. For example, if I have 2 processors and want the fork to create a parallel process but in the same processor that contains the parent. Does NodeJS provide any method for this? I am looking for a control over the allocation of the processes. ... Is this even a good idea?
Also, what are the maximum number of processes that could be forked and why?
I've no Node.js wisdom to impart, simply some info on what OSes generally do.
Any modern OS will schedule processes / threads on CPUs and cores according to the prevailing burden on the machine. The whole point is that they're very good at this, so one is going to have to try very hard to come up with scheduling / core affinity decisions that beat the OS. Almost no one bothers. Unless you're running on very specific hardware (which perhaps, perhaps one might get to understand very well), you're having to make a lot of complex decisions for every single different machine the code runs on.
If you do want to try then I'm assuming that you'll have to dig deep below node.JS to make calls to the underlying C library. Most OSes (including Linux) provide means for a process to control core affinity (it's exposed in Linux's glibc).

What's the point of multi-threading on a single core?

I've been playing with the Linux kernel recently and diving back into the days of OS courses from college.
Just like back then, I'm playing around with threads and the like. All this time I had been assuming that threads were automatically running concurrently on multiple cores but I've recently discovered that you actually have to explicitly code for handling multiple cores.
So what's the point of multi-threading on a single core? The only example I can think of is from college when writing a client/server program but that seems like a weak point.
All this time I had been assuming that threads were automatically
running concurrently on multiple cores but I've recently discovered
that you actually have to explicitly code for handling multiple cores.
The above is incorrect for any widely used, modern OS. All of Linux's schedulers, for example, will automatically schedule threads on different cores and even automatically move threads from one core to another when necessary to maximize core usage. There are some APIs that allow you to modify the schedulers' behavior, but these APIs are generally used to disable automatic thread-to-core scheduling, not to enable it.
So what's the point of multi-threading on a single core?
Imagine you have a GUI program whose purpose is to execute an expensive computation (for example, render a 3D image or a Mandelbrot set) and then display the result. Let's say this computation takes 30 seconds to complete on this particular CPU. If you implement that program the obvious way, and use only a single thread, then the user's GUI controls will be unresponsive for 30 seconds while the calculation is executing -- the user will be unable to do anything with your program, and possibly unable to do anything with his computer at all. Since users expect GUI controls to be responsive at all times, that would be a poor user experience.
If you implement that program with two threads (one GUI thread and one rendering thread), on the other hand, the user will be able to click buttons, resize the window, quit the program, choose menu items, etc, even while the computation is executing, because the OS is able to wake up the GUI thread and allow it to handle mouse/keyboard events when necessary.
Of course, it is possible to write this program with a single thread and keep its GUI responsive, by writing your single thread to do just a few milliseconds worth of computation, then check to see if there are GUI events available to process, handling them, then going back to do a bit more computation, etc. But if you code your app this way, you are essentially writing your own (very primitive) thread scheduler inside your app anyway, so why reinvent the wheel?
The first versions of MacOS were designed to run on a single core, but had no real concept of multithreading. This forced every application developer to correctly implement some manual thread management -- even if their app did not have any extended computations, they had to explicitly indicate when they were done using the CPU, e.g. by calling WaitNextEvent. This lack of multithreading made early (pre-MacOS-X) versions of MacOS famously unreliable at multitasking, since just one poorly written application could bring the whole computer to a grinding halt.
First, a program not only computes, but also waits for input/output and so can be considered as executing on an I/O processor. So even single-core machine is a multi-processor machine, and employing of multi-threading is justified.
Second, a task can be divided in several threads in the sake of modularity.
Multithreading is not only for taking advantage of multiple cores.
You need multiple processes for multitasking. For similar reason you are allowed to have multiple threads, which are lightweight compared with processes.
You probably don't want to spawn processes all the time for things like blocking I/O. That may be overkill.
And there is fiber, which is even more lightweight. So we have process, thread, and fiber for different levels of needs.
Well, when you say multithreading on a single core, there are things you need to consider. For example, the thread API that you are using - is it user level or kernel level. Most probably from you question I believe you are using user level threads.
Now, user level threads, depending upon the host OS or the API itself may map to single kernel thread or multiple. Many relations are possible like 1-1,many-1 or many-many.
Now, if there is a single core, your OS can still provide you several Kernel level threads which may behave as multiple processes to the CPU. In which case, OS will give you a time-slicing (and multi-programming) on the kernel threads leading to superfast context switch and via the user level API - you/your code will seem to have multithreaded features.
Also note that eventhough your processor is a single core, depending on the make, it can be hyperthreaded and have super deep pipelines allowing the concurrent running of Kernel threads with very low overhead.
For references: Check Intel/AMD architecture and how various OS provide Kernel threads.

multithreading and multitasking on single core processor vs multicore processor

Definitions:
Process(task): ist a program in execution. e.g: Notepad
Thread: A thread is a single sequence of instructions. A process consists of one or more threads(but only one can execute at a time).
According to the lecture a single core processor can run a single process(task) at a time.Only one thread can execute at a time but the Operating system achieves Multithreading using time slicing(thread context switch). This Thread switching happens frequently enough that the user perceives the threads as running at the same time (but they aren't running parallel!)and it occurs inside the one process.A Process context switch is similar to thread context switch with a difference that it takes place between processes (example between mediaplayer und notepad) instead between threads.
I'm not sure if this example is valid : taking two processes e.g: Notepad and Mediaplayer on a single core processor. One can play music and write in a Notepad at the same time although the two processes aren't runnin parallely(Process context switching or multitasking).Inside the one process e.g :Mediaplayer one can listen to music and create playlists at the same time although the two threads aren't running parallely (Thread context switch or multithreading)
1st Question : Are my Information above right ?
2nd Question : would an Execution of Threads in a multicore Process look the same inside a one core but with a difference that the threads of different processes can run parallely?.Is multithreading here the process of running multiple threads simultaneously on difference processes or the process of swiching between threads on a one core ? The same Question would be also for Multitasking.
How would the Process context switch and thread context switch in this case take place ?
3rd Question: The Professor used the Term single threaded processor. Is this Term an another name for sigle core Processor ?
or
several threads belonging to the same process can be executed on several CPU cores simultaneously.Time slicing still happens on multicore systems. Say one have Process with 20 Threads running on a quadcore - the OS still has to schedule 21 Threads to run on only 4 cores.
A single-threaded process runs on only one single core at a time. But that doesn't mean it'll run on the same core until it exits. The OS might give him a time slice to run on Core 1 now, pause it, and give it another time slice on Core 2 later
note : I read a lot of books and i googled enough before i decided to ask here.
EDITED
Yes, you seem to have a good understanding of this topic (not sure if it is really interesting, though). However, you seem to overthink it. I suggest a simpler way of understanding the way it works on the modern systems (it is really wild west when you start look back, with the idea of light-weighted processess and such, but I will not talk about it).
The process is a shell. It's only purpose in life is to provide environment for threads. Only the threads are really executed, process itself is never executed. A single process can host multiple threads within it, and when it hosts only one thread, one can say process is executed - but it is simply a manner of saying. A CPU can only execute a thread, not a process.
Your professor, as they often do, makes misleading statements. There is no such thing as single-threaded processor. There are single and multicored processors, and those processors can be joined together to provide multi-processor environment. From the application developer perspective, a single CPU with 4 cores does not differ from 4 single-core CPUs. There are differences, of course - but usually not for the application developer.
Multitasking is a laymen term. It can mean whatever one wants it to mean, and better be avoided in non-specific contexts.
I hope I did clarify your confusiuon.
The answer to your questions is the following:
Q1: On a single core processor two tasks can't run parallelly in the form of executing two (processor) instructions at the same time, the only possible way of multithreading is time-slicing realized by the task-scheduler (of the OS), so in that case you are approximately right. I would complete your view on the subject with the fact, that nowadays almost none of the applications are single-threaded. I don't know if notepad uses multiple threads, but I'm pretty sure, media player is multithreaded, and the task scheduler schedules time slices between threads not processes. (Fun fact: a single-threaded .NET application already runs 4-5 threads.)
Q2: Task scheduler on any system tries to spread the load between available cores, so time slices will work most likely how you displayed above, but if a process executes an additional thread, it will be executed on the core with the least load over it. Multiple cores also mean, multiple (processor) instructions can and will be executed at the same time.
Q3: In practice multithreaded processor and multicore processor means something very similar, but not the same. You see for example Intel Core i3/i5/i7 CPUs are equipped with an internal pseudo-task-scheduler, which doubles the number of virtual cores by scheduling the execution of two threads on the same core, so for example my i5 system is 2 cored but 4 threaded.
your most of concepts seem valid with non standard terms.
here is explanation of what are threads and process and then multithreading
process is running instance of program is true
when there were no thread then resources were only distributed among processes.
Now processes have threads so resources are distributed to threads but isolation is same of process means two processes still need IPC to communicate with each other. You can say multithreading as lightweight processes which can be scheduled by operating system. multit-hreading is an extension of multi-tasking so if there is one core and two processes: one with two threads and one with 4 threads, the contention of accessing core is between 6 threads not 2 processes.
for thread switch and process switch see thread context switch vs process context switch

Threads vs Processes in Linux [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
Improve this question
I've recently heard a few people say that in Linux, it is almost always better to use processes instead of threads, since Linux is very efficient in handling processes, and because there are so many problems (such as locking) associated with threads. However, I am suspicious, because it seems like threads could give a pretty big performance gain in some situations.
So my question is, when faced with a situation that threads and processes could both handle pretty well, should I use processes or threads? For example, if I were writing a web server, should I use processes or threads (or a combination)?
Linux uses a 1-1 threading model, with (to the kernel) no distinction between processes and threads -- everything is simply a runnable task. *
On Linux, the system call clone clones a task, with a configurable level of sharing, among which are:
CLONE_FILES: share the same file descriptor table (instead of creating a copy)
CLONE_PARENT: don't set up a parent-child relationship between the new task and the old (otherwise, child's getppid() = parent's getpid())
CLONE_VM: share the same memory space (instead of creating a COW copy)
fork() calls clone(least sharing) and pthread_create() calls clone(most sharing). **
forking costs a tiny bit more than pthread_createing because of copying tables and creating COW mappings for memory, but the Linux kernel developers have tried (and succeeded) at minimizing those costs.
Switching between tasks, if they share the same memory space and various tables, will be a tiny bit cheaper than if they aren't shared, because the data may already be loaded in cache. However, switching tasks is still very fast even if nothing is shared -- this is something else that Linux kernel developers try to ensure (and succeed at ensuring).
In fact, if you are on a multi-processor system, not sharing may actually be beneficial to performance: if each task is running on a different processor, synchronizing shared memory is expensive.
* Simplified. CLONE_THREAD causes signals delivery to be shared (which needs CLONE_SIGHAND, which shares the signal handler table).
** Simplified. There exist both SYS_fork and SYS_clone syscalls, but in the kernel, the sys_fork and sys_clone are both very thin wrappers around the same do_fork function, which itself is a thin wrapper around copy_process. Yes, the terms process, thread, and task are used rather interchangeably in the Linux kernel...
Linux (and indeed Unix) gives you a third option.
Option 1 - processes
Create a standalone executable which handles some part (or all parts) of your application, and invoke it separately for each process, e.g. the program runs copies of itself to delegate tasks to.
Option 2 - threads
Create a standalone executable which starts up with a single thread and create additional threads to do some tasks
Option 3 - fork
Only available under Linux/Unix, this is a bit different. A forked process really is its own process with its own address space - there is nothing that the child can do (normally) to affect its parent's or siblings address space (unlike a thread) - so you get added robustness.
However, the memory pages are not copied, they are copy-on-write, so less memory is usually used than you might imagine.
Consider a web server program which consists of two steps:
Read configuration and runtime data
Serve page requests
If you used threads, step 1 would be done once, and step 2 done in multiple threads. If you used "traditional" processes, steps 1 and 2 would need to be repeated for each process, and the memory to store the configuration and runtime data duplicated. If you used fork(), then you can do step 1 once, and then fork(), leaving the runtime data and configuration in memory, untouched, not copied.
So there are really three choices.
That depends on a lot of factors. Processes are more heavy-weight than threads, and have a higher startup and shutdown cost. Interprocess communication (IPC) is also harder and slower than interthread communication.
Conversely, processes are safer and more secure than threads, because each process runs in its own virtual address space. If one process crashes or has a buffer overrun, it does not affect any other process at all, whereas if a thread crashes, it takes down all of the other threads in the process, and if a thread has a buffer overrun, it opens up a security hole in all of the threads.
So, if your application's modules can run mostly independently with little communication, you should probably use processes if you can afford the startup and shutdown costs. The performance hit of IPC will be minimal, and you'll be slightly safer against bugs and security holes. If you need every bit of performance you can get or have a lot of shared data (such as complex data structures), go with threads.
Others have discussed the considerations.
Perhaps the important difference is that in Windows processes are heavy and expensive compared to threads, and in Linux the difference is much smaller, so the equation balances at a different point.
Once upon a time there was Unix and in this good old Unix there was lots of overhead for processes, so what some clever people did was to create threads, which would share the same address space with the parent process and they only needed a reduced context switch, which would make the context switch more efficient.
In a contemporary Linux (2.6.x) there is not much difference in performance between a context switch of a process compared to a thread (only the MMU stuff is additional for the thread).
There is the issue with the shared address space, which means that a faulty pointer in a thread can corrupt memory of the parent process or another thread within the same address space.
A process is protected by the MMU, so a faulty pointer will just cause a signal 11 and no corruption.
I would in general use processes (not much context switch overhead in Linux, but memory protection due to MMU), but pthreads if I would need a real-time scheduler class, which is a different cup of tea all together.
Why do you think threads are have such a big performance gain on Linux? Do you have any data for this, or is it just a myth?
I think everyone has done a great job responding to your question. I'm just adding more information about thread versus process in Linux to clarify and summarize some of the previous responses in context of kernel. So, my response is in regarding to kernel specific code in Linux. According to Linux Kernel documentation, there is no clear distinction between thread versus process except thread uses shared virtual address space unlike process. Also note, the Linux Kernel uses the term "task" to refer to process and thread in general.
"There are no internal structures implementing processes or threads, instead there is a struct task_struct that describe an abstract scheduling unit called task"
Also according to Linus Torvalds, you should NOT think about process versus thread at all and because it's too limiting and the only difference is COE or Context of Execution in terms of "separate the address space from the parent " or shared address space. In fact he uses a web server example to make his point here (which highly recommend reading).
Full credit to linux kernel documentation
If you want to create a pure a process as possible, you would use clone() and set all the clone flags. (Or save yourself the typing effort and call fork())
If you want to create a pure a thread as possible, you would use clone() and clear all the clone flags (Or save yourself the typing effort and call pthread_create())
There are 28 flags that dictate the level of resource sharing. This means that there are over 268 million flavours of tasks that you can create, depending on what you want to share.
This is what we mean when we say that Linux does not distinguish between a process and a thread, but rather alludes to any flow of control within a program as a task. The rationale for not distinguishing between the two is, well, not uniquely defining over 268 million flavours!
Therefore, making the "perfect decision" of whether to use a process or thread is really about deciding which of the 28 resources to clone.
How tightly coupled are your tasks?
If they can live independently of each other, then use processes. If they rely on each other, then use threads. That way you can kill and restart a bad process without interfering with the operation of the other tasks.
To complicate matters further, there is such a thing as thread-local storage, and Unix shared memory.
Thread-local storage allows each thread to have a separate instance of global objects. The only time I've used it was when constructing an emulation environment on linux/windows, for application code that ran in an RTOS. In the RTOS each task was a process with it's own address space, in the emulation environment, each task was a thread (with a shared address space). By using TLS for things like singletons, we were able to have a separate instance for each thread, just like under the 'real' RTOS environment.
Shared memory can (obviously) give you the performance benefits of having multiple processes access the same memory, but at the cost/risk of having to synchronize the processes properly. One way to do that is have one process create a data structure in shared memory, and then send a handle to that structure via traditional inter-process communication (like a named pipe).
In my recent work with LINUX is one thing to be aware of is libraries. If you are using threads make sure any libraries you may use across threads are thread-safe. This burned me a couple of times. Notably libxml2 is not thread-safe out of the box. It can be compiled with thread safe but that is not what you get with aptitude install.
I'd have to agree with what you've been hearing. When we benchmark our cluster (xhpl and such), we always get significantly better performance with processes over threads. </anecdote>
The decision between thread/process depends a little bit on what you will be using it to.
One of the benefits with a process is that it has a PID and can be killed without also terminating the parent.
For a real world example of a web server, apache 1.3 used to only support multiple processes, but in in 2.0 they added an abstraction so that you can swtch between either. Comments seems to agree that processes are more robust but threads can give a little bit better performance (except for windows where performance for processes sucks and you only want to use threads).
For most cases i would prefer processes over threads.
threads can be useful when you have a relatively smaller task (process overhead >> time taken by each divided task unit) and there is a need of memory sharing between them. Think a large array.
Also (offtopic), note that if your CPU utilization is 100 percent or close to it, there is going to be no benefit out of multithreading or processing. (in fact it will worsen)
Threads -- > Threads shares a memory space,it is an abstraction of the CPU,it is lightweight.
Processes --> Processes have their own memory space,it is an abstraction of a computer.
To parallelise task you need to abstract a CPU.
However the advantages of using a process over a thread is security,stability while a thread uses lesser memory than process and offers lesser latency.
An example in terms of web would be chrome and firefox.
In case of Chrome each tab is a new process hence memory usage of chrome is higher than firefox ,while the security and stability provided is better than firefox.
The security here provided by chrome is better,since each tab is a new process different tab cannot snoop into the memory space of a given process.
Multi-threading is for masochists. :)
If you are concerned about an environment where you are constantly creating threads/forks, perhaps like a web server handling requests, you can pre-fork processes, hundreds if necessary. Since they are Copy on Write and use the same memory until a write occurs, it's very fast. They can all block, listening on the same socket and the first one to accept an incoming TCP connection gets to run with it. With g++ you can also assign functions and variables to be closely placed in memory (hot segments) to ensure when you do write to memory, and cause an entire page to be copied at least subsequent write activity will occur on the same page. You really have to use a profiler to verify that kind of stuff but if you are concerned about performance, you should be doing that anyway.
Development time of threaded apps is 3x to 10x times longer due to the subtle interaction on shared objects, threading "gotchas" you didn't think of, and very hard to debug because you cannot reproduce thread interaction problems at will. You may have to do all sort of performance killing checks like having invariants in all your classes that are checked before and after every function and you halt the process and load the debugger if something isn't right. Most often it's embarrassing crashes that occur during production and you have to pore through a core dump trying to figure out which threads did what. Frankly, it's not worth the headache when forking processes is just as fast and implicitly thread safe unless you explicitly share something. At least with explicit sharing you know exactly where to look if a threading style problem occurs.
If performance is that important, add another computer and load balance. For the developer cost of debugging a multi-threaded app, even one written by an experienced multi-threader, you could probably buy 4 40 core Intel motherboards with 64gigs of memory each.
That being said, there are asymmetric cases where parallel processing isn't appropriate, like, you want a foreground thread to accept user input and show button presses immediately, without waiting for some clunky back end GUI to keep up. Sexy use of threads where multiprocessing isn't geometrically appropriate. Many things like that just variables or pointers. They aren't "handles" that can be shared in a fork. You have to use threads. Even if you did fork, you'd be sharing the same resource and subject to threading style issues.
If you need to share resources, you really should use threads.
Also consider the fact that context switches between threads are much less expensive than context switches between processes.
I see no reason to explicitly go with separate processes unless you have a good reason to do so (security, proven performance tests, etc...)

Resources