Can this be viewed as an I/O bound task?

Can this be viewed as an I/O bound task? - multithreading

Let's say you have 4 physical cores on your computer, and let's assume there is no hyperthreading and that python version is 3.2+ (although I am not sure if these extra information matter for my question).
If I were to open a pool of 3 subprocesses, hence each subprocess occupying one physical core when they do some CPU bound tasks, and if I were to open up 3 threads from the current process (occupying one core left from 4) where the OS is running, and if I were to send CPU bound tasks down the multiprocessing to each of the 3 subprocesses, then the question is this:
From the perspective of the current process that is managing the threads (and these threads are pushing tasks out to each subprocesses and is waiting for the result to come back from these subprocesses), can these CPU bound tasks be viewed as I/O bound tasks (from the perspective of current process) since the current process is not actually doing any work? Equivalently, will the 3 threads go to sleep, while the 3 subprocesses are crunching away at the numbers and occupying the 3 cores, and let the last core sit there idling?

will the 3 threads go to sleep, while the 3 subprocesses are crunching away at the numbers and occupying the 3 cores, and let the last core sit there idling?
Yes. I can't imagine what else could happen, do you have another possibility in mind? As you say, the threads are waiting.
In this situation you can probably make 4 processes to work on the CPU bound tasks.
It sounds like your problem is well suited for multiprocessing.Pool. In that case note that if you don't specify the number of processes to use, it uses the number of CPU cores by default:
processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.
which is an official sign that using as many processes as cores is a normal practice.

Related

Concurrent threads, processes and multiple cores

I'm trying to understand the usage of CPU cores with regard to concurrent threads and processes. Please see the below questions:
Assume I have 2 CPU cores. When there are 2 processes running, each process has only 1 thread. Are the two processes using the 2 cores?
Assume I have 2 CPU cores. When there is 1 process running, which has 2 threads. Are the two threads using the 2 cores?
Assume I have 2 CPU cores. When there are 2 processes running, each process has 2 threads. How are the two cores used by those processes and cores?
How to calculate the maximum real concurrent execution given CPU cores? What are other factor I should take into account?

1,2: Quite likely but not definitely. A portion of the system software determines what runs where. It would be unlikely to choose to keep a process or thread waiting for cpu attention when there is one that is otherwise idle, it isn't absolute.
Most processing involves some sort of transfer to and from a device, network, etc.. Typically this necessitates a period of inactivity waiting for the transfer to complete. During this inactivity, another process / thread can run on that cpu. So, if a given process is 30% cpu time and 70% I/O time, then I can run about 3 of them concurrently on a single cpu without degrading performance.
3,4: Like the paragraph above implies, depending upon the workload, their could be any distribution of the threads among the cpus. If the threads were all compute bound (100% cpu), most operating systems switch between them at a small enough granularity that all remain lively, and large enough that the switching has a minimal impact on them.
This scheduling may take other notions into consideration, such as data affinity. Recently touched bits of data are likely to remain in the cpu cache when a thread has relinquished it. The next time the thread is to be scheduled, it would be best to put it onto that cpu, to retain the effort required to warm the cache for it. It might also think that two threads of one process (address space) are more likely to share data, so should prefer the same cpu.
4: depending upon your system, there are likely to be many performance analysis tools available. Top, on UNIX-inspired systems is a simple tool which gives system wide utilization information, and the simple tool time will show how much time a process spent on a cpu vs real-world time. If you run each of your tasks sequentially, noting the cpu-time that they take, then time them running concurrently, the ratio between these cpu-times indicates the scaling factor of your concurrent app. Note that real-world time can be misleading because of io-overlap.

How is fairness of thread scheduling ensured across processes?

Every process has at least one thread of execution and I read somewhere that modern Operating Systems only schedule Thread and not process.
So if there are two processes running in the system - P1 with 1 thread and P2 with 100 threads, how will OS scheduling algorithm ensure that both P1 and P2 get approximately same amount of CPU time? If OS blindly schedules threads, P2 will get 100 times more CPU time than P1.
Does it also take into account which Process a particular thread belong to? Otherwise, it seems too easy for a process to hog all the CPU by creating more threads.

Does it also take into account which Process a particular thread belong to? Otherwise, it seems too easy for a process to hog all the CPU by creating more threads.
Wrong question. Consider two jobs that are trying to solve the exact same problem by doing the same work and are perfectly identical except for one thing -- one uses dozens of threads, the other uses dozens of processes. Why should the one that uses dozens of processes get more CPU time than the one that uses dozens of threads?
Your notion of fairness is not really a sensible one.
Instead, scheduling is more designed around trying to get as much work done as possible per unit time. The assumption is that everything the computer is doing is useful and it benefits competing tasks to have other tasks competing with them finish as quickly as possible too.
This is actually all you need the vast majority of the time. But occasionally you have special situations where this doesn't work. One is ultra-high-priority tasks like keeping video or audio flowing or keeping a user interface responsive. Another is ultra-low-priority tasks where there's an enormous amount of work you want done and you don't want the system to be slow for a long time while you're working on it. Priorities are used for this, and generally the system allows higher-priority threads to interrupt lower-priority ones to keep responsiveness.

In general, "fair thread scheduling" attempts to give each thread an equal amount of CPU time (regardless of how much CPU time all threads in a process get); and "fair process scheduling" attempts to give each process the same amount of CPU time (e.g. by giving threads belonging to different processes unequal amounts of CPU time). These are mutually exclusive - you can't have both (unless each process has the same number of threads).
Note that it's all a broken joke anyway. For example, if one thread gets 10 ms of time on a CPU that is running slow due to thermal throttling (and/or because another logical CPU in the same core is busy) and another thread gets 10 ms of time on a CPU that is running faster than normal (e.g. due to "turbo-boost" and/or because the other logical CPU in the core is not being used); then these threads have received an equal amount of CPU time but have not received anything that could be considered "fair" (because one thread might be able to get 20 times as much work done than the other).
Note that it's all unwanted anyway. For example, for a good OS threads would be given a priority to indicate how important the work they do is, and you don't want a high priority thread (doing very important work) to get the same "fair share" of CPU time as a low priority thread (doing irrelevant/unimportant work). For cases where two threads have equal priority you might (in theory) want them to get an "equal" amount of CPU time; but in practice this isn't common and threads block and unblock so often that it isn't worth caring about; and in practice it can lead to "two half finished jobs instead of one completed job and one unstarted job" scenarios that increases the average amount of time a job (e.g. request for work) takes to complete.

If the thread is the basic unit of scheduling (a generally safe assumption these days) then the process scheduler is the one to decide who to allocate the CPUs. How (and whether) it takes thread usage into account is entirely system specific. AND the behavior ma depends upon the type of process. For example, in VMS (and adopted in Windoze) realtime processes are treated differently than other types of processes.
In the VMS-type scheduling, a process with more threads gets more CPU by design. Better for an application to use more threads and for it to use more processes.
Keep in mind that a system may impose limits on the number of threads in a process.

Will a multi-threaded application be actually faster than a single-threaded application?

All is entirely theoretical, the question just came to mind and I wasn't entirely sure whats the answer:
Assume you have an application that calculates 4 independent calculations. (Totally independent, doesn't matter what order you do them and you don't need one to calculate another).
Also assume those calculations are long (minutes) and CPU-bound (not waiting for any kind of IO)
1) Now, if you have a 1-processor computer, a single thread application will logically be faster than (or the same as) a multithreaded application. As the computer not able to do more then one thing at a time with one processor, it would "waste" time on context switching and the likes.
So far so good?
2) If you have a 4 processor computer, 4 threads will mostly likely be faster for this than single thread. Right? your computer can now do 4 operations at a time so its just logical to divide your application to 4 threads, and it should complete with the time the longest of the 4 calculations take.
Still good so far?
3) And now the actual part I am confused about - why would I EVER have my application create more threads than the number of processors (well actually - cores) available? I have programmed and have seen applications that create tens and hundreds of threads, but actually - the perfect number is about 8 for an average computer?
P.S. I already read this: Threading vs single thread
but didn't quiet answer that.
Cheers

Why would I EVER have my application create more threads than the number of processors (well actually - cores) available?
One very good reason is if you have threads that wait on events. For example you might have a producer/consumer application in which the producer is reading from some data stream, and that data arrives in bursts: a few hundred (or thousand) records in a batch, followed by nothing for a while, and then another burst. Say you have a 4-core machine. You could have a single producer thread that reads the data and places it in a queue, and three consumer threads to process the queue.
Or, you could have a single producer thread and four consumer threads. Most of the time, the producer thread is idle, giving you four consumer threads to process items from the queue. But when items are available on the data stream, one of the consumer threads gets swapped out in favor of the producer.
That's a simplified example, but substantially similar to programs that I have in production.
More generally, it doesn't make any sense to create more continuously-working (i.e. CPU bound) threads than you have processing units (CPU cores in general, although the existence of hyperthreading muddies the waters a bit). If you know that your threads won't be waiting on external events, then having n+1 threads when you only have n cores will end up wasting time with thread context switches. Note that this is strictly in the context of your program. If there are other applications and OS services running, your application's threads will get swapped out from time to time so that those other apps and services can get a timeslice. But one assumes that, if you're running a CPU-intensive program, you'll limit the other apps and services that are running at the same time.
Your best bet, of course, is to set up a test. On a 4-core machine, test your app with 1, 2, 3, 4, 5, ... threads. Time how long it takes to complete with different numbers of threads. I think you'll find that on a 4-core machine the sweet spot will be 3 or 4; most likely 4 unless there are other apps or OS services that take a lot of CPU.

One reason i could come up with for more threads than cores would be if some threads needed to interface with other parties... waiting for a response from a server.. querying something from the database. This will allow the thread to sleep until an answer is provided. this way other computations wouldn't have to wait. in the 4cores->4thread the thread would wait for input which possibly causes other code to have to wait too

Adding threads to your application is not strictly about performance gains. Some times you want or need to perform more than one task at the same time because that is the most logical way to architect your program.
As an example, perhaps you are writing a game engine, if you take a multi-threaded approach, you may have one thread for physics, one thread for graphics, one thread for networking, one thread for user input, one thread for resource loading from disk etc.
Also James Baxters point is very true as well. Some times threads are waiting on a resource and can not execute further until they access said resource. With only the same number of threads as cores, one core would be going to waste.

I think you are assuming that all programs are CPU bound - remember some of your threads will be waiting for I/O (disk/network/user traffic).

Does a process run threads in a sequential order?

The question is about multithreading. Say I have 3 threads, the main one, a child1, and a child2. Does the process executing these threads run it in an order that it works on one thread for a short amount of time, then works on the other, and so on and forth and keeps switching, or are the threads running without ever being stopped by the process? Somewhere I read that a thread gets stopped without finish, then another thread is worked on and stopped, then back to thread1 and so on on forth, but that wouldn't make any sense if any threads are stopped as the point of mutlithreading was that they are all concurrent and all run at the same time, but how does the processor do that?
This is in .Net/C#.

the scenario you describe is the way IS ran thread in the old age before multi-core
OS scheduled thread sequentially based in their priorities, but now... I suppose you have at least 2 core where 2 thread can run concurrently and the 3rd thread will be schedule and interrupt one of the other!!!!

The scenario you're describing is correct, except that one thread will normally be running at each time per processor core.
Simplified; if 3 threads are active on 4 cores, they will all always be allowed to run since there's always an available core to run them, while if 3 threads are active on 2 cores, only two can run at any time so they will have to take turns.

Operating systems schedule threads to execute on the available CPU cores (either real or virtual). In the past, most computers had single core CPUs, and thus only one thread could be executed at a time. Modern CPUs are typically 2, 4, or 8 core systems. Some of these cores are virtual, like Intel's hyperthreading CPUs which have twice as many virtual cores as physical cores.
However, there are almost always more threads than CPU cores available, so the OS will prioritize all of the threads on the system in order to run them as efficiently as possible. The threads created by your process may or may not truly run in parallel over any given time span, but you should assume that they will.

Threads vs Cores

Say if I have a processor like this which says # cores = 4, # threads = 4 and without Hyper-threading support.
Does that mean I can run 4 simultaneous program/process (since a core is capable of running only one thread)?
Or does that mean I can run 4 x 4 = 16 program/process simultaneously?
From my digging, if no Hyper-threading, there will be only 1 thread (process) per core. Correct me if I am wrong.

A thread differs from a process. A process can have many threads. A thread is a sequence of commands that have a certain order. A logical core can execute on sequence of commands. The operating system distributes all the threads to all the logical cores available, and if there are more threads than cores, threads are processed in a fast cue, and the core switches from one to another very fast.
It will look like all the threads run simultaneously, when actually the OS distributes CPU time among them.
Having multiple cores gives the advantage that less concurrent threads will be placed on one single core, less switching between threads = greater speed.
Hyper-threading creates 2 logical cores on 1 physical core, and makes switching between threads much faster.

That's basically correct, with the obvious qualifier that most operating systems let you execute far more tasks simultaneously than there are cores or threads, which they accomplish by interleaving the executing of instructions.
A system with hyperthreading generally has twice as many hardware threads as physical cores.

The term thread is generally used as a description of an operating system concept that has the potential to execute independently of other threads. Whether it does so depends on whether it is stuck waiting for some event (disk or screen I/O, message queue), or if there are enough physical CPUs (hyperthreaded or not) to allow it run in the face of other non-waiting threads.
Hyperthreading is a CPU vendor term that means a single core, that can multiplex its attention between two computations. The easy way to think about a hyperthreaded core is as if you had two real CPUs, both slightly slower than what the manufacture says the core can actually do.

Basically this is up to the OS. A thread is a high-level construct holding a instruction pointer, and where the OS places a threads execution on a suitable logical processor. So with 4 cores you can basically execute 4 instructions in parallell. Where as a thread simply contains information about what instructions to execute and the instructions placement in memory.
An application normally uses a single process during execution and the OS switches between processes to give all processes "equal" process time. When an application deploys multiple threads the processes allocates more than one slot for execution but shares memory between threads.
Normally you make a difference between concurrent and parallell execution. Where parallell execution is when you actually physically execute instructions of more than one logical processor and concurrent execution is the the frequent switching of a single logical processor giving the apperence of parallell execution.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string