Operating Systems Efficiency [closed] - multithreading

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
What happens in the OS if an infinite loop is running ? Also, if an infinite loop is running and I try to start another program, would it work ? If yes, what will be the effect on the speed of the other program ?

If your program executes in an infinite loop without context switching then one core of the machine will be tied up and unable to run anything else at the same time. Context switches happen when your code is waiting for another thread or the completion of an IO operation.
If your code is completely consuming one core like this, the operating system might still be responsive if the machine has multiple cores, and there are no other threads doing the same thing.

OS remains as usual and nothing happens to it even if you are running an infinite loop, depending on what happens to three key resources of systems - CPU, Memory / RAM and Disk.
Just saying that loop is infinite is not enough, you need to also specify what kind of processing you doing in that loop.
Is it performing memory only processing and not leaving CPU free for other programs / threads ( i.e. CPU is 100% occupied) ?
Most modern systems are multi core CPU systems so most likely it will keep only one system CPU busy and rest will be free for other programs. If your loop is performing some disk I/O, it will leave even that CPU for other programs too and OS will reschedule it on different cores at different times.
Is processing continuously filling up system memory ( so eventually leading to memory being full - which will crash everything )? Or Is processing continuously filling up system memory but can fill only up to a limit ( like what happens in Java programs i.e. a fixed amount of heap memory is allocated to that program ) so program will terminate itself with out of memory error and not crash the system provided heap memory allocated is less than total system memory.
Are you writing data to disk in that loop? So eventually, disk be full and your system will crash.
So to answer this,
Also, if an infinite loop is running and I try to start another
program, would it work ? If yes, what will be the effect on the speed
of the other program ?
It will work if there are free resources needed by new programs. Speed will be affected depending on what kind of program you are trying to start - does it need large memory, is it multi threaded and you don't have free cores? etc etc.
This is just a rough idea as what might go on due to there being infinite loop. On a side note, most of back ground services running on OS contain infinite loops but since those loops have been programmed well, we don't see any adverse side effects.

Related

Can I use every thread my system has to offer at once? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 days ago.
Improve this question
I am writing a C++ program that will have multithreading as a major part of it to speed up some tasks. However, since I have 8 threads on my processor accessible to me, I was wondering if I can actually use all 8 simultaneously? Would it actually speed things up if I only used 7 so that my OS can use the 8th thread without fighting for processing power and slowing things down?
For the moment my code is running on Ubuntu if that makes any difference. The task involve my threads taking in a handful of data, doing a ton of number crunching, then saving the results directly to disk from the thread. I've tested a similar thread-to-disk approach before in a different project of mine, and my SSD can handle this no problem with the threads going, although my HDD bottlenecks (Quite honestly, kinda expected that).
So mainly I'm just wondering if it is good practice to reserve a thread or two to allow my OS to run unimpeded, or if there's going to be no functional difference and I just ramp my code all the way to running on the max number of threads my system has
The term "thread" in programming is different from the term "thread" as used in CPU design. When you run a multi-threaded program, you're effectively telling the operating system "here are the operations that I want to do in parallel". The operating system then schedules the threads by deciding which CPU core should run them (and if you have threaded CPUs, which of the core's threads), but there isn't a 1-to-1 correspondence; for example, if the operating system needs one CPU core for its own purposes, then it'll just use it itself, and put the threads of your program onto the other cores; even if there are more threads than cores, the operating system will simply put multiple threads on a single core and have the CPU alternate between them. (Operating systems will typically also move your threads from one core to another over the course of execution, in order to help balance how much the various cores are being used.)
This means that optimising a multi-threaded program for performance isn't as simple as creating a fixed number of software threads based on the number of CPU cores you have – sometimes, a lower number is good due to overhead in the threading mechanism, and sometimes, it's optimal to have a number of threads that's substantially greater than the number of CPU cores on the system (because this gives the OS more flexibility as to how to schedule them). As such, people generally recommend benchmarking various numbers of threads to see which is fastest for any given program.
If you're concerned about the threads of your program overwhelming the other tasks that the computer is trying to perform, there's an alternative approach to trying to fix that problem – on most operating systems, you can set a thread's priority. In cases where there are more threads trying to run than the CPU can handle simultaneously, lower-priority threads will be given less CPU time in order to reserve more CPU time for higher-priority threads. Lowering the priority of all your program's threads can thus be useful if it's causing the system to be less responsive when run.
My recommendation is to start by running the program with a number of threads equal to the maximum number your CPU can run simultaneously (8 in your case), but it's probably also worth bench-marking that number minus 1 (7) and 1.5 times that number (12) to see if either of those work better. (I'd also look at the computer's CPU utilisation when running the benchmarks – if your program is I/O-bound rather than CPU-bound, then the CPU won't be able to run at full capacity no matter how many threads you have, and in that case you might be able to run the CPU part of your program at full speed even with fewer threads than cores.)
If the computer becomes insufficiently responsive as a consequence, try lowering the program's priority first (technically: the priority of all the program's threads), and if that doesn't work, only then should you try lowering the number of threads (or banning them from using a specific CPU) in order to prevent them causing performance issues.

Process & thread scheduling overhead

There are a few things I don't quite understand when it come to scheduling:
I assume each process/thread, as long as it is CPU bound, is given a time window. Once the window is over, it's swapped out and another process/thread is ran. Is that assumption correct? Are there any ball park numbers how long that window is on a modern PC? I'm assuming around 100 ms? What's the overhead of swapping out like? A few milliseconds or so?
Does the OS schedule by procces or by an individual kernel thread? It would make more sense to schedule each process and within that time window run whatever threads that process has available. That way the process context switching is minimized. Is my understanding correct?
How does the time each thread runs compare to other system times, such as RAM access, network access, HD I/O etc?
If I'm reading a socket (blocking) my thread will get swapped out until data is available then a hardware interrupt will be triggered and the data will be moved to the RAM (either by the CPU or by the NIC if it supports DMA) . Am I correct to assume that the thread will not necessarily be swapped back in at that point to handle he incoming data?
I'm asking primarily about Linux, but I would imagine the info would also be applicable to Windows as well.
I realize it's a bunch of different questions, I'm trying to clear up my understanding on this topic.
I assume each process/thread, as long as it is CPU bound, is given a time window. Once the window is over, it's swapped out and another process/thread is ran. Is that assumption correct? Are there any ball park numbers how long that window is on a modern PC? I'm assuming around 100 ms? What's the overhead of swapping out like? A few milliseconds or so?
No. Pretty much all modern operating systems use pre-emption, allowing interactive processes that suddenly need to do work (because the user hit a key, data was read from the disk, or a network packet was received) to interrupt CPU bound tasks.
Does the OS schedule by proces or by an individual kernel thread? It would make more sense to schedule each process and within that time window run whatever threads that process has available. That way the process context switching is minimized. Is my understanding correct?
That's a complex optimization decision. The cost of blowing out the instruction and data caches is typically large compared to the cost of changing the address space, so this isn't as significant as you might think. Typically, picking which thread to schedule of all the ready-to-run threads is done first and process stickiness may be an optimization affecting which core to schedule on.
How does the time each thread runs compare to other system times, such as RAM access, network access, HD I/O etc?
Obviously, threads have to run through a very large number of RAM accesses because switching threads requires a large number of such accesses. Hard drive and network I/O are generally slow enough that a thread that's waiting for such a thing is descheduled.
Fast SSDs change things a bit. One thing I'm seeing a lot of lately is long-treasured optimizations that use a lot of CPU to try to avoid disk accesses can be worse than just doing the disk access on some modern machines!

Programming with threads, what is the benefit? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Given a single core CPU, what is the benefit to coding using threads?
At least with the Java implementation, and it seems intuitive to naturally extend to any other language considering the single core restriction, you may have several threads performing various actions but the processes are time-limited and switched.
Given process A and process B:
What is the benefit of performing half of process A, finish process B, and then finish the second half of process A VS performing process A then B?
It seems that the switching between the threads would introduce time delays that would prolong the overall completion time of both processes VS not switching and just completing A then B.
The reason to use threads on a single-core system is simply to allow processes that would otherwise use all the CPU to be preempted by other tasks that need to get done sooner. The most common reason to make a system multi-threaded is to have a responsive user interface even while performing long calculations.
Of course, any operation can take a long time (reading a file, accessing a database, resizing a photo, recalculating a spreadsheet), and those operations can be performed on a separate thread to allow the thread responding to user input to operate the whole time.
Twenty years ago, for example, it was rare to have a multi-CPU system or an OS that allowed multi-threading, so nearly every program was single-threaded and there were many frameworks created to allow systems to have UIs and still do I/O. The standard mechanism for this is an event loop, where all events (UI, network, timers, etc.) are processed in a big loop.
This type of system means that the UI is held up during things like file I/O and calculations. In order to not hold up the UI too much, you have to do the I/O in chunks (say, read the file 4k at a time), processing any incoming UI events between chunks. This is really just a hack to keep the system running, but it's hard to make the system run smoothly like this because you don't know how often you need to process events.
The solution is to have a separate thread to recalculate your spreadsheet or write your file. That way the OS can give those threads fair timeslices while still preempting them to run the UI, allowing the UI to always be responsive.
An executing thread is not necessarily doing anything useful. The canonical example is reading from disk -- that data isn't going to be there for another few milliseconds, during which time the processor would be sitting unused. Threads allow one piece of the program to use the CPU while other pieces of the program are waiting for operations to complete.
There are many reasons. Wikipedia gives a decent overview on its page about threads.
Here's a few OTOH:
I/O bound tasks benefit from threading (especially network applications).
Hyperthreaded processors may speed up multithreaded applications even on a single core.
Threads can be instructed to wait (block) and wake up on specific events, enabling responsive event-driven programming.
If your program has to do several things "at the same time" then threads are a good way to go, particularly is some of those tasks are quite long running. Otherwise you find yourself writing code that looks like an operating system scheduler inside your program, which is always a waste of time if the OS underneath you has a perfectly good one already. You'd find that your source code was mostly 'scheduler' and not much 'program', which is very inelegant. A good threaded program can be very elegant and economic in source code, which makes oneself look good and saves time.
Some run times get/got it wrong. In the early days of Ada the runtime environment would do its own thread scheduling, and it was never very satisfactory. That was partly due to the fact that whilst the Ada language spec included the concept of threads, the OSes we had back then quite often didn't provide them. Ada got a lot better when the compiler writers started using the underlying OS threads instead.
Similarly Python doesn't really properly use the underlying OS threads; it spoils it with the Global Interpreter Lock. Python has sidestepped the whole issue by going for multiprocessing instead (not necessarily a good thing on Windows hosts...).
Early versions of Windows didn't do threads either, they did cooperative multitasking. This depended on each process in the whole machine calling any OS routine at least now and then. Each OS routine would first consult the 'scheduler' to see if anything else was waiting to run before getting on with whatever it was supposed to be doing on behalf of the program. There were many terrible programs back then that wouldn't play ball and hogged the entire machine. You couldn't get on with playing a game of Solitaire when something else embarked on a length calculation.
What's the mental model of your program?
IF it depends on multiple external inputs that can happen in unpredictable orders, and if what you want to do in response to those inputs is not simple and can overlap in time ...
THEN it makes sense to devote a separate thread to each input request, and have that thread perform the response needed by that request.
So, for example, if your program is waiting for input requests from an external channel, and each request must trigger its own protocol of outgoing and incoming messages, it can very much simplify the code to create a new thread (or re-use an old one) for each request.
Somehow people seem to enter the workforce thinking that threads are only there for speed (through parallelism).
That's one use, provided it allows multiple CPU chips to get cranking,
but it is by no means the only use.

If 256 threads give better performance than 8 have I likely got the wrong approach?

I've just started programming with POSIX threads on dual-core x86_64 Linux system. It seems that 256 threads is about the optimum for performance with the way I've done it. I'm wondering how this could be? And if it could mean that my approach is wrong and a better approach would require far fewer threads and be just as fast or faster?
For further background (the program in question is a skeleton for a multi-threaded M-set image generator) see the following questions I've asked already:
Using threads, how should I deal with something which ideally should happen in sequential order?
How can my threaded image generating app get it’s data to the gui?
Perhaps I should mention that the skeleton (in which I've reproduced minimal functionality for testing and comparison) is now displaying the image, and the actual calculations are done almost twice as fast as the non-threaded program.
So if 256 threads running faster than 8 threads is not indicative of a poor approach to threading, how come 256 threads does outperform 8 threads?
The speed test case is a portion of the Mandelbrot Set located at:
xmin -0.76243636067708333333333328
xmax -0.7624335575810185185185186
ymax 0.077996663411458333333333929
calculated to a maximum of 30000 iterations.
On the non-threaded version rendering time on my system is around 15 seconds. On the threaded version, averages speed for 8 threads is 7.8 seconds, while 256 threads is 7.6 seconds.
Well, probably yes, you're doing something wrong.
However, there are circumstances where 256 threads would run better than 8 without you necessarily having a bad threading model. One must remember that having 8 threads does not mean all 8 threads are actually running all the time. Anytime one thread makes a blocking syscall to the operating system, the thread will stop running and wait for the result. In the meantime, another thread can often do work.
There's this myth that one can't usefully use more threads than contexts on the CPU, but that's just not true. If your threads block on a syscall, it can be critical to have another thread available to do more work. (In practice when threads block there tends to be less work to do, but this is not always the case.)
It's all very dependent on work-load and there's no one right number of threads for any particular application. Generally you never want less threads available than the OS will run, and that's the only true rule. (Unfortunately this can be very hard to find out and so people tend to just fire up as many threads as contexts and then use non-blocking syscalls where possible.)
Could it be your app is io bound? How is the image data generated?
A performance improvement gained by allocating more threads than cores suggests that the CPU is not the bottleneck. If I/O access such as disk, memory or even network access are involved your results make perfect sense.
You are probably benefitting from Simultaneous Multithreading (SMT). Your operating system schedules more threads than cores available, and will swap in and out the threads that are not stalled waiting for resources (such as a memory load). This can very effectively hide the latencies of your memory system from your program and is the technique used to great effect for massive parallelization in CUDA for general purpose GPU programming.
If you are seeing a performance increase with the jump to 256 threads, then what you are probably dealing with is a resource bottleneck. At some point, your code is waiting for some slow device (a hard disk or a network connection, for example) in order to continue. With multiple threads, waiting on this slow device isn't a problem because instead of sitting idle and twiddling its electronic thumbs, the CPU can process another thread while the first thread is waiting on the slow device. The more parallel threads that are running, the more work the CPU can do while it is waiting on something else.
If you are seeing performance improve all the way up to 256 threads, I am tempted to say that you have a major performance bottleneck somewhere and it's not the CPU. To test this, try to see if you can measure the idle time of individual threads. I suspect that you will see your threads are stuck in a "blocked" or "waiting" state for a longer portion of their lifetime than they spend in the "running" or "active" state. Some debuggers or function profiling tools will let you do this, and I think there are also Linux tools to do this on the command line.

Multithreading in Uniprocessor [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I wish to know how multi-threading in a uniprocessor system is helpful my doubt is
when you create the thread it is going to take the execution time slice from the main thread only and other thing is scheduling of threads (context switch between the threads) will also takes considerable amount of time (preemptive kernel) and at a time processor is going to execute only one thread.
Many processes have their speed bound by the slow speed of I/O devices such as disks. Using multiple threads, you can do useful work even while waiting for a slow disk access to complete. Of course, if your process is not I/O bound, then multi-threading on a single processor can cause slow-downs, rather than speed-ups - it's a question of horses for courses.
It can also be helpful to the user experience to use multiple threads, even if things don't actually run faster because of it.
Nothing worse than seeing an entire window refuse to repaint when an operation is going off in the background, especially when there's a progress bar which of course becomes useless.
Because sometimes threading is the most natural way to express your program. Threads provide a way for you to represent tasks that should conceptually run at the same time. Even though, on single processors they obviously can't run at the same time.
One common area to use threading is GUIs, for example. You don't want your GUI to be unresponsive just because there is a lot of work going on in another area of the program. So by splitting off the GUI into another thread, you can still have your GUI responsive despite a lot of computation somewhere else in your program.
If you put the heavy work in separate threads, the gui is still responsive.
Multithreading was invented because it was found that most of the time a program is waiting for I/O. If the processor is shared among other programs this idle time can be made use of. Even though some processor time is spent managing thread/processes this practice was found to be more productive than running one program at a time to the end in sequence.
It depends on the OS, but the scheduler usually considers thread priority as well. For example, for 'real-time' audio applications (e.g. recording the audio with some processing), the processing and recording is more important than the UI refreshment, since the audio signal is lost forever if you miss even a few samples.
Most "pro-grade" audio applications used multi-threading long before multi-core CPU became common-place.
With Uniprocessor systems, multithreading helps in sharing the CPU among multiple tasks so that no one task hogs the CPU till it gets completed.
A good example is a game, where you have to do many things concurrently.
The common approach is to have a main loop where you process events, game logic, physics, graphics and sound; but if those task need to be interleaved in a non static-deterministic way, because some of them take more than one iteration to complete (for example, you're dropping some frames, but the game logic is still running) or you need to sample sound more frequently because otherwise glitches can be heard; the scheduler of you game is likely to become more and more complex...
In that case, you could just split your tasks in threads and let the OS to do the scheduling job for you. But you'll need to design that very carefully because it's very probable that all the threads have to read the same data (the world state) and one or two of them also write it (the game logic and physics) so it's imperative to stablish the proper locks.
Interestingly, when I tried a PLINQ sample (Parallel LINQ i.e. automatic multithreading expressed using LINQ expressions) on my uniprocessor PC, I still gained a roughly 2x speed increase. This baffles me, but my best guess is that it's to do with Hyperthreading. So a single-core CPU can apparently behave as though it is using simultaneous multithreaded execution. I don't really understand hyperthreading, but what I guess is happening is that a second thread is fitted into some time that the first thread would see as the CPU idling.
Worth experimenting.
Multi threading is useful in uniprocessors because a process can be run simultaneously on I/O devices and CPU with the help of multiple threads.

Resources