Multi-threaded rendering D3D/OpenGL/Whatever

Multi-threaded rendering D3D/OpenGL/Whatever - multithreading

I've been reading a lot about multi-threaded rendering. People have been proposing all kinds of weird and wonderful schemes for submitting work to the GPU with threads in order to speed up their frame rates and get more stuff rendered, but I'm having a bit of a conceptual problem with the whole thing and I thought I'd run it by the gurus here to see what you think.
As far as I know, the basic unit of concurrency on a GPU is the Warp. That is to say, it's down at the pixel level rather than higher up at the geometry submission level. So given that the unit of concurrency on the GPU is the warp, the driver must be locked down pretty tightly with mutexes to prevent multiple threads screwing up each other's submissions. If this is the case, I don't see where the benefit is of coding to D3D or OpenGL multi-threading primitives.
Surely the most efficient method of using your GPU in a multi-threading scenario is at the higher, abstract level, where you're collecting together batches of work to do, before submitting it? I mean rather than randomly interleving commands from multiple threads, I would have thought a single block accepting work from multiple threads, but with a little intelligence inside of it to make sure things are ordered for better performance before being submitted to the renderer, would be a much bigger gain if you wanted to work with multiple threads.
So, whither D3D/OpenGL multi-threaded rendering support in the actual API?
Help me with my confusion!

Your question comes from a misunderstanding of the difference between "make their renderers multi-threaded" and "multithreaded rendering".
A "renderer", or more precisely a "rendering system," does more than just issue rendering commands to the API. It has to shuffle memory around. It may have to load textures dynamically into and out-of graphics memory. It may have to read data back after some rendering process. And so forth.
To make a renderer multithreaded means exactly that: to make the rendering system make use of multiple threads. This could be threading scene graph management tasks like building the list of objects to render (frustum culling, BSPs, portals, etc). This could be having a thread dedicated to texture storage management, which swaps textures in and out as needed, loading from disk and such. This could be as in the D3D11 case of command lists, where you build a series of rendering commands in parallel with other tasks.
The process of rendering, the submission of actual rendering commands to the API, is not threaded. You generally have one thread who is responsible for the basic glDraw* or ::DrawIndexedPrimitive work. D3D11 command lists allow you to build sequences of these commands, but they are not executed in parallel with other rendering commands. It is the rendering thread and the main context that is responsible for actually issuing the command list; the command list is just there to make putting that list together more thread-friendly.

In Direct3D 11 you generally create deferred contexts to which you make draw calls from your worker threads. Once work is complete and you are ready to render, you generate a command list from each deferred context and execute it on the immediate (front thread) context. This allows the draw calls to be composed in multiple threads whilst preserving correct ordering of the draw calls etc.

Related

Why must/should UI frameworks be single threaded?

Closely related questions have been asked before:
Why are most UI frameworks single threaded?.
Should all event-driven frameworks be single-threaded?
But the answers to those questions still leave me unclear on some points.
The asker of the first question asked if multi-threading would help performance, and the answerers mostly said that it would not, because it is very unlikely that the GUI would be the bottleneck in a 2D application on modern hardware. But this seems to me a sneaky debating tactic. Sure, if you have carefully structured your application to do nothing other than UI calls on the UI thread you won't have a bottleneck. But that might take a lot of work and make your code more complicated, and if you had a faster core or could make UI calls from multiple threads, maybe it wouldn't be worth doing.
A commonly advocated architectural design is to have view components that don't have callbacks and don't need to lock anything except maybe their descendants. Under such an architecture, can't you let any thread invoke methods on view objects, using per-object locks, without fear of deadlock?
I am less confident about the situation with UI controls, but as long their callbacks are only invoked by the system, why should they cause any special deadlock issues? After all, if the callbacks need to do anything time consuming, they will delegate to another thread, and then we're right back in the multiple threads case.
How much of the benefit of a multi-threaded UI would you get if you could just block on the UI thread? Because various emerging abstractions over async in effect let you do that.
Almost all of the discussion I have seen assumes that concurrency will be dealt with using manual locking, but there is a broad consensus that manual locking is a bad way to manage concurrency in most contexts. How does the discussion change when we take into consideration the concurrency primitives that the experts are advising us to use more, such as software transactional memory, or eschewing shared memory in favor of message passing (possibly with synchronization, as in go)?

TL;DR
It is a simple way to force sequencing to occur in an activity that is going to ultimately be in sequence anyway (the screen draw X times per second, in order).
Discussion
Handling long-held resources which have a single identity within a system is typically done by representing them with a single thread, process, "object" or whatever else represents an atomic unit with regard to concurrency in a given language. Back in the non-emptive, negligent-kernel, non-timeshared, One True Thread days this was managed manually by polling/cycling or writing your own scheduling system. In such a system you still had a 1::1 mapping between function/object/thingy and singular resources (or you went mad before 8th grade).
This is the same approach used with handling network sockets, or any other long-lived resource. The GUI itself is but one of many such resources a typical program manages, and typically long-lived resources are places where the ordering of events matters.
For example, in a chat program you would usually not write a single thread. You would have a GUI thread, a network thread, and maybe some other thread that deals with logging resources or whatever. It is not uncommon for a typical system to be so fast that its easier to just put the logging and input into the same thread that makes GUI updates, but this is not always the case. In all cases, though, each category of resources is most easily reasoned about by granting them a single thread, and that means one thread for the network, one thread for the GUI, and however many other threads are necessary for long-lived operations or resources to be managed without blocking the others.
To make life easier its common to not share data directly among these threads as much as possible. Queues are much easier to reason about than resource locks and can guarantee sequencing. Most GUI libraries either queue events to be handled (so they can be evaluated in order) or commit data changes required by events immediately, but get a lock on the state of the GUI prior to each pass of the repaint loop. It doesn't matter what happened before, the only thing that matters when painting the screen is the state of the world right then. This is slightly different than the typical network case where all the data needs to be sent in order and forgetting about some of it is not an option.
So GUI frameworks are not multi-threaded, per se, it is the GUI loop that needs to be a single thread to sanely manage that single long-held resource. Programming examples, typically being trivial by nature, are often single-threaded with all the program logic running in the same process/thread as the GUI loop, but this is not typical in more complex programs.
To sum up
Because scheduling is hard, shared data management is even harder, and a single resource can only be accessed serially anyway, a single thread used to represent each long-held resource and each long-running procedure is a typical way to structure code. GUIs are only one resource among several that a typical program will manage. So "GUI programs" are by no means single-threaded, but GUI libraries typically are.
In trivial programs there is no realized penalty to putting other program logic in the GUI thread, but this approach falls apart when significant loads are experienced or resource management requires either a lot of blocking or polling, which is why you will often see event queue, signal-slot message abstractions or other approaches to multi-threading/processing mentioned in the dusty corners of nearly any GUI library (and here I'm including game libraries -- while game libs typically expect that you want to essentially build your own widgets around your own UI concept, the basic principles are very similar, just a bit lower-level).
[As an aside, I've been doing a lot of Qt/C++ and Wx/Erlang lately. The Qt docs do a good job of explaining approaches to multi-threading, the role of the GUI loop, and where Qt's signal/slot approach fits into the abstraction (so you don't have to think about concurrency/locking/sequencing/scheduling/etc very much). Erlang is inherently concurrent, but wx itself is typically started as a single OS process that manages a GUI update loop and Erlang posts update events to it as messages, and GUI events are sent to the Erlang side as messages -- thus permitting normal Erlang concurrent coding, but providing a single point of GUI event sequencing so that wx can do its GUI update looping thing.]

Because the GUI main thread code is old. Very old and therefore very much designed for low resource usage. If someone would write everything from scratch again (and even Android as the most recent GUI OS didn't) it would be working well and be better in multithreading.
For example the best two improvements that would help for MT are
Now we have MVVM (Model-View-ViewModel) pattern, this is an extra duplication of data. When the toolskits were developed even a single duplication in a MVC was highly debated. MVVM makes multithreading much easier. IMHO this was the main reason for Microsoft to invent it in the first place in .NET not the data binding.
The scene graph approach. Android, iOS, Windows UWP (based on CoreWindow not hWnd until Windows11 Project Reunion), Gtk4 is decoupling the GPU part from the model. Yes it is in fact a MVVMGM now (Model-View-ViewModel-GPUModel). So another memory intense layer. If you duplicate stuff you need less synchronisation. Combine on Android and SwiftUI on MacOS/iOS is using immutability of GUI widgets now to further improve this View->GPUModel.
Especially with the GPU Model/Scene Graph, the statement that GUIs are single threaded is not true anymore.

Two reasons, as far as I can tell:
It is much easier to reason about single-threaded code; thus the event loop model reduces the likelihood of bugs.
2D User interfaces are not CPU intensive. An old computer with a wimpy graphics card can smoothly render all the windows, frames, widgets, etc. you could possibly desire without skipping a beat.
Basically, if single-threaded code is easier and tends to have fewer bugs, favor that over multithreaded code unless you have a compelling need for parallelization or speed. Your typical GUI frameworks don't have this need.
Now, of course we've all experienced lagginess and freezes from GUI applications before. I'd argue that the vast majority of the time, this is the fault of the developer: putting long-running synchronous code for an event that should have been handled asynchronously (which is a mechanism all the major UI frameworks have).

Does multiple isolated OpenGL context affect performance

My co-worker and I are working on a video rendering engine.
The whole idea is to parse a configuration file and render each frame to offscreen FBO, and then fetch the frame render results using glReadPixel for video encoding.
We tried to optimize the rendering speed by creating two threads each with an independent OpenGL context. One thread render odd frames and the other render even frames. The two threads do not share any gl resources.
The results are quite confusing. On my computer, the rendering speed increased compared to our single thread implementation, while on my partner's computer, the entire speed dropped.
I wonder here that, how do the amount of OpenGL contexts affect the overall performance. Is it really a good idea to create multiple OpenGL threads if they do not share anything.

Context switching is certainly not free. As pretty much always for performance related questions, it's impossible to quantify in general terms. If you want to know, you need to measure it on the system(s) you care about. It can be quite expensive.
Therefore, you add a certain amount of overhead by using multiple contexts. If that pays off depends on where your bottleneck is. If you were already GPU limited with a single CPU thread, you won't really gain anything because you can't get the GPU to do the work quicker if it was already fully loaded. Therefore you add overhead for the context switches without any gain, and make the whole thing slower.
If you were CPU limited, using multiple CPU threads can reduce your total elapsed time. If the parallelization of the CPU work combined with the added overhead for synchronization and context switches results in a net total gain again depends on your use case and the specific system. Trying both and measuring is the only good thing to do.
Based on your problem description, you might also be able to use multithreading while still sticking with a single OpenGL context, and keeping all OpenGL calls in a single thread. Instead of using glReadPixels() synchronously, you could have it read into PBOs (Pixel Buffer Objects), which allows you to use asynchronous reads. This decouples GPU and CPU work much better. You could also do the video encoding on a separate thread if you're not doing that yet. This approach will need some inter-thread synchronization, but it avoids using multiple contexts, while still using parallel processing to get the job done.

How to articulate the difference between asynchronous and parallel programming?

Many platforms promote asynchrony and parallelism as means for improving responsiveness. I understand the difference generally, but often find it difficult to articulate in my own mind, as well as for others.
I am a workaday programmer and use async & callbacks fairly often. Parallelism feels exotic.
But I feel like they are easily conflated, especially at the language design level. Would love a clear description of how they relate (or don't), and the classes of programs where each is best applied.

When you run something asynchronously it means it is non-blocking, you execute it without waiting for it to complete and carry on with other things. Parallelism means to run multiple things at the same time, in parallel. Parallelism works well when you can separate tasks into independent pieces of work.
Take for example rendering frames of a 3D animation. To render the animation takes a long time so if you were to launch that render from within your animation editing software you would make sure it was running asynchronously so it didn't lock up your UI and you could continue doing other things. Now, each frame of that animation can also be considered as an individual task. If we have multiple CPUs/Cores or multiple machines available, we can render multiple frames in parallel to speed up the overall workload.

I believe the main distinction is between concurrency and parallelism.
Async and Callbacks are generally a way (tool or mechanism) to express concurrency i.e. a set of entities possibly talking to each other and sharing resources.
In the case of async or callback communication is implicit while sharing of resources is optional (consider RMI where results are computed in a remote machine).
As correctly noted this is usually done with responsiveness in mind; to not wait for long latency events.
Parallel programming has usually throughput as the main objective while latency, i.e. the completion time for a single element, might be worse than a equivalent sequential program.
To better understand the distinction between concurrency and parallelism I am going to quote from Probabilistic models for concurrency of Daniele Varacca which is a good set of notes for theory of concurrency:
A model of computation is a model for concurrency when it is able to represent systems as composed of independent autonomous components, possibly communicating with each other. The notion of concurrency should not be confused with the notion of parallelism. Parallel computations usually involve a central control which distributes the work among several processors. In concurrency we stress the independence of the components, and the fact that they communicate with each other. Parallelism is like ancient Egypt, where the Pharaoh decides and the slaves work. Concurrency is like modern Italy, where everybody does what they want, and all use mobile phones.
In conclusion, parallel programming is somewhat a special case of concurrency where separate entities collaborate to obtain high performance and throughput (generally).
Async and Callbacks are just a mechanism that allows the programmer to express concurrency.
Consider that well-known parallel programming design patterns such as master/worker or map/reduce are implemented by frameworks that use such lower level mechanisms (async) to implement more complex centralized interactions.

This article explains it very well: http://urda.cc/blog/2010/10/04/asynchronous-versus-parallel-programming
It has this about asynchronous programming:
Asynchronous calls are used to prevent “blocking” within an application. [Such a] call will spin-off in an already existing thread (such as an I/O thread) and do its task when it can.
this about parallel programming:
In parallel programming you still break up work or tasks, but the key differences is that you spin up new threads for each chunk of work
and this in summary:
asynchronous calls will use threads already in use by the system and parallel programming requires the developer to break the work up, spinup, and teardown threads needed.

async: Do this by yourself somewhere else and notify me when you complete(callback). By the time i can continue to do my thing.
parallel: Hire as many guys(threads) as you wish and split the job to them to complete quicker and let me know(callback) when you complete. By the time i might continue to do my other stuff.
the main difference is parallelism mostly depends on hardware.

My basic understanding is:
Asynchonous programming solves the problem of waiting around for an expensive operation to complete before you can do anything else. If you can get other stuff done while you're waiting for the operation to complete then that's a good thing. Example: keeping a UI running while you go and retrieve more data from a web service.
Parallel programming is related but is more concerned with breaking a large task into smaller chunks that can be computed at the same time. The results of the smaller chunks can then be combined to produce the overall result. Example: ray-tracing where the colour of individual pixels is essentially independent.
It's probably more complicated than that, but I think that's the basic distinction.

I tend to think of the difference in these terms:
Asynchronous: Go away and do this task, when you're finished come back and tell me and bring the results. I'll be getting on with other things in the mean time.
Parallel: I want you to do this task. If it makes it easier, get some folks in to help. This is urgent though, so I'll wait here until you come back with the results. I can do nothing else until you come back.
Of course an asynchronous task might make use of parallelism, but the differentiation - to my mind at least - is whether you get on with other things while the operation is being carried out or if you stop everything completely until the results are in.

It is a question of order of execution.
If A is asynchronous with B, then I cannot predict beforehand when subparts of A will happen with respect to subparts of B.
If A is parallel with B, then things in A are happening at the same time as things in B. However, an order of execution may still be defined.
Perhaps the difficulty is that the word asynchronous is equivocal.
I execute an asynchronous task when I tell my butler to run to the store for more wine and cheese, and then forget about him and work on my novel until he knocks on the study door again. Parallelism is happening here, but the butler and I are engaged in fundamentally different tasks and of different social classes, so we don't apply that label here.
My team of maids is working in parallel when each of them is washing a different window.
My race car support team is asynchronously parallel in that each team works on a different tire and they don't need to communicate with each other or manage shared resources while they do their job.
My football (aka soccer) team does parallel work as each player independently processes information about the field and moves about on it, but they are not fully asynchronous because they must communicate and respond to the communication of others.
My marching band is also parallel as each player reads music and controls their instrument, but they are highly synchronous: they play and march in time to each other.
A cammed gatling gun could be considered parallel, but everything is 100% synchronous, so it is as though one process is moving forward.

Why Asynchronous ?
With today's application's growing more and more connected and also potentially
long running tasks or blocking operations such as Network I/O or Database Operations.So it's very important to hide the latency of these operations by starting them in background and returning back to the user interface quickly as possible. Here Asynchronous come in to the picture, Responsiveness.
Why parallel programming?
With today's data sets growing larger and computations growing more complex. So it's very important to reduce the execution time of these CPU-bound operations, in this case, by dividing the workload into chunks and then executing those chunks simultaneously. We can call this as "Parallel" .
Obviously it will give high Performance to our application.

Asynchronous
Let's say you are the point of contact for your client and you need to be responsive i.e. you need to share status, complexity of operation, resources required etc whenever asked. Now you have a time-consuming operation to be done and hence cannot take this up as you need to be responsive to the client 24/7. Hence, you delegate the time-consuming operation to someone else so that you can be responsive. This is asynchronous.
Parallel programming
Let's say you have a task to read, say, 100 lines from a text file, and reading one line takes 1 second. Hence, you'll require 100 seconds to read the text file. Now you're worried that the client must wait for 100 seconds for the operation to finish. Hence you create 9 more clones and make each of them read 10 lines from the text file. Now the time taken is only 10 seconds to read 100 lines. Hence you have better performance.
To sum up, asynchronous coding is done to achieve responsiveness and parallel programming is done for performance.

Asynchronous: Running a method or task in background, without blocking. May not necessorily run on a separate thread. Uses Context Switching / time scheduling.
Parallel Tasks: Each task runs parallally. Does not use context switching / time scheduling.

I came here fairly comfortable with the two concepts, but with something not clear to me about them.
After reading through some of the answers, I think I have a correct and helpful metaphor to describe the difference.
If you think of your individual lines of code as separate but ordered playing cards (stop me if I am explaining how old-school punch cards work), then for each separate procedure written, you will have a unique stack of cards (don't copy & paste!) and the difference between what normally goes on when run code normally and asynchronously depends on whether you care or not.
When you run the code, you hand the OS a set of single operations (that your compiler or interpreter broke your "higher" level code into) to be passed to the processor. With one processor, only one line of code can be executed at any one time. So, in order to accomplish the illusion of running multiple processes at the same time, the OS uses a technique in which it sends the processor only a few lines from a given process at a time, switching between all the processes according to how it sees fit. The result is multiple processes showing progress to the end user at what seems to be the same time.
For our metaphor, the relationship is that the OS always shuffles the cards before sending them to the processor. If your stack of cards doesn't depend on another stack, you don't notice that your stack stopped getting selected from while another stack became active. So if you don't care, it doesn't matter.
However, if you do care (e.g., there are multiple processes - or stacks of cards - that do depend on each other), then the OS's shuffling will screw up your results.
Writing asynchronous code requires handling the dependencies between the order of execution regardless of what that ordering ends up being. This is why constructs like "call-backs" are used. They say to the processor, "the next thing to do is tell the other stack what we did". By using such tools, you can be assured that the other stack gets notified before it allows the OS to run any more of its instructions. ("If called_back == false: send(no_operation)" - not sure if this is actually how it is implemented, but logically, I think it is consistent.)
For parallel processes, the difference is that you have two stacks that don't care about each other and two workers to process them. At the end of the day, you may need to combine the results from the two stacks, which would then be a matter of synchronicity but, for execution, you don't care again.
Not sure if this helps but, I always find multiple explanations helpful. Also, note that asynchronous execution is not constrained to an individual computer and its processors. Generally speaking, it deals with time, or (even more generally speaking) an order of events. So if you send dependent stack A to network node X and its coupled stack B to Y, the correct asynchronous code should be able to account for the situation as if it was running locally on your laptop.

Generally, there are only two ways you can do more than one thing each time. One is asynchronous, the other is parallel.
From the high level, like the popular server NGINX and famous Python library Tornado, they both fully utilize asynchronous paradigm which is Single thread server could simultaneously serve thousands of clients (some IOloop and callback). Using ECF(exception control follow) which could implement the asynchronous programming paradigm. so asynchronous sometimes doesn't really do thing simultaneous, but some io bound work, asynchronous could really promotes the performance.
The parallel paradigm always refers multi-threading, and multiprocessing. This can fully utilize multi-core processors, do things really simultaneously.

Summary of all above answers
parallel computing:
▪ solves throughput issue.
Concerned with breaking a large task into smaller chunks
▪ is machine related (multi machine/core/cpu/processor needed), eg: master slave, map reduce.
Parallel computations usually involve a central control which distributes the work among several processors
asynchronous:
▪ solves latency issue
ie, the problem of 'waiting around' for an expensive operation to complete before you can do anything else
▪ is thread related (multi thread needed)
Threading (using Thread, Runnable, Executor) is one fundamental way to perform asynchronous operations in Java

interesting thread synchronization problem

I am trying to come up with a synchronization model for the following scenario:
I have a GUI thread that is responsible for CPU intensive animation + blocking I/O. The GUI thread retrieves images from the network (puts them in a shared buffer) , these images are processed (CPU intensive operation..done by a worker thread) and then these images are animated ( again CPU intensive..done by the GUI thread).
The processing of images is done by a worker thread..it retrieves images from the shared buffer processes them and puts them in an output buffer.
There is only once CPU and the GUI thread should not get scheduled out while it is animating the images (the animation has to be really smooth). This means that the work thread should get the CPU only when the GUI thread is waiting for I/O operation to complete.
How do i go about achieving this? This looks like a classic producer consumer problem...but i am not quite sure how i can guarantee that the animation will be as smooth as possible ( i am open to using more threads).
I would like to use QThreads (Qt framework) for platform independence but i can consider pthreads for more control ( as currently we are only aiming for linux).
Any ideas?
EDIT:
i guess the problems boils down to one thing..how do i ensure that the animation thread is not interrupted while it is animating the images ( the animation runs when the user goes from one page to the other..all the images in the new page are animated before shown in their proper place..this is a small operation but it must be really smooth).The worker thread can only run when the animation is over..

Just thinking out loud here, but it sounds like you have two compute-intensive tasks, animation and processing, and you want animation to always have priority over processing. If that is correct then maybe instead of having these tasks in separate threads you could have a single thread that handles both animation and processing.
For instance, the thread could have two task-queues, one for animation jobs and one for processing jobs, and it only starts a job from the processing queue when the animation queue is empty. But, this will only work well if each individual processing job is relatively small and/or interruptible at arbitrary positions (otherwise animation jobs will get delayed, which is not what you want).

The first big question is: Do I really need threads? Qt 's event system and network objects make it easy to not having the technical burden of threads and all the snags that comes with it.
Have a look at alternative ways to address issues here and here. These techniques great if you are sticking to pure Qt code and do not depend on a 3rd party library. If you must use a 3rd party lib that does blocking calls then sure, you can use threads.
Here is an example of a consumer producer.
Also have a look at Advanced Qt Programming: Creating Great Software with C++ and Qt 4
My advice is to start without threads and see how it fares. You can always refactor to threads after. So, best is to design your objects/architecture without too much coupling.
If you want you can post some code to give more context.

What kinds of applications need to be multi-threaded?

What are some concrete examples of applications that need to be multi-threaded, or don't need to be, but are much better that way?
Answers would be best if in the form of one application per post that way the most applicable will float to the top.

There is no hard and fast answer, but most of the time you will not see any advantage for systems where the workflow/calculation is sequential. If however the problem can be broken down into tasks that can be run in parallel (or the problem itself is massively parallel [as some mathematics or analytical problems are]), you can see large improvements.
If your target hardware is single processor/core, you're unlikely to see any improvement with multi-threaded solutions (as there is only one thread at a time run anyway!)
Writing multi-threaded code is often harder as you may have to invest time in creating thread management logic.
Some examples
Image processing can often be done in parallel (e.g. split the image into 4 and do the work in 1/4 of the time) but it depends upon the algorithm being run to see if that makes sense.
Rendering of animation (from 3DMax,etc.) is massively parallel as each frame can be rendered independently to others -- meaning that 10's or 100's of computers can be chained together to help out.
GUI programming often helps to have at least two threads when doing something slow, e.g. processing large number of files - this allows the interface to remain responsive whilst the worker does the hard work (in C# the BackgroundWorker is an example of this)
GUI's are an interesting area as the "responsiveness" of the interface can be maintained without multi-threading if the worker algorithm keeps the main GUI "alive" by giving it time, in Windows API terms (before .NET, etc) this could be achieved by a primitive loop and no need for threading:
MSG msg;
while(GetMessage(&msg, hwnd, 0, 0))
{
TranslateMessage(&msg);
DispatchMessage(&msg);
// do some stuff here and then release, the loop will come back
// almost immediately (unless the user has quit)
}

Servers are typically multi-threaded (web servers, radius servers, email servers, any server): you usually want to be able to handle multiple requests simultaneously. If you do not want to wait for a request to end before you start to handle a new request, then you mainly have two options:
Run a process with multiple threads
Run multiple processes
Launching a process is usually more resource-intensive than lauching a thread (or picking one in a thread-pool), so servers are usually multi-threaded. Moreover, threads can communicate directly since they share the same memory space.
The problem with multiple threads is that they are usually harder to code right than multiple processes.

There are really three classes of reasons that multithreading would be applied:
Execution Concurrency to improve compute performance: If you have a problem that can be broken down into pieces and you also have more than one execution unit (processor core) available then dispatching the pieces into separate threads is the path to being able to simultaneously use two or more cores at once.
Concurrency of CPU and IO Operations: This is similar in thinking to the first one but in this case the objective is to keep the CPU busy AND also IO operations (ie: disk I/O) moving in parallel rather than alternating between them.
Program Design and Responsiveness: Many types of programs can take advantage of threading as a program design benefit to make the program more responsive to the user. For example the program can be interacting via the GUI and also doing something in the background.
Concrete Examples:
Microsoft Word: Edit document while the background grammar and spell checker works to add all the green and red squiggle underlines.
Microsoft Excel: Automatic background recalculations after cell edits
Web Browser: Dispatch multiple threads to load each of the several HTML references in parallel during a single page load. Speeds page loads and maximizes TCP/IP data throughput.

These days, the answer should be Any application that can be.
The speed of execution for a single thread pretty much peaked years ago - processors have been getting faster by adding cores, not by increasing clock speeds. There have been some architectural improvements that make better use of the available clock cycles, but really, the future is taking advantage of threading.
There is a ton of research going on into finding ways of parallelizing activities that we traditionally wouldn't think of parallelizing. Even something as simple as finding a substring within a string can be parallelized.

Basically there are two reasons to multi-thread:
To be able to do processing tasks in parallel. This only applies if you have multiple cores/processors, otherwise on a single core/processor computer you will slow the task down compared to the version without threads.
I/O whether that be networked I/O or file I/O. Normally if you call a blocking I/O call, the process has to wait for the call to complete. Since the processor/memory are several orders of magnitude quicker than a disk drive (and a network is even slower) it means the processor will be waiting a long time. The computer will be working on other things but your application will not be making any progress. However if you have multiple threads, the computer will schedule your application and the other threads can execute. One common use is a GUI application. Then while the application is doing I/O the GUI thread can keep refreshing the screen without looking like the app is frozen or not responding. Even on a single processor putting I/O in a different thread will tend to speed up the application.
The single threaded alternative to 2 is to use asynchronous calls where they return immediately and you keep controlling your program. Then you have to see when the I/O completes and manage using it. It is often simpler just to use a thread to do the I/O using the synchronous calls as they tend to be easier.
The reason to use threads instead of separate processes is because threads should be able to share data easier than multiple processes. And sometimes switching between threads is less expensive than switching between processes.
As another note, for #1 Python threads won't work because in Python only one python instruction can be executed at a time (known as the GIL or Global Interpreter Lock). I use that as an example but you need to check around your language. In python if you want to do parallel calculations, you need to do separate processes.

Many GUI frameworks are multi-threaded. This allows you to have a more responsive interface. For example, you can click on a "Cancel" button at any time while a long calculation is running.
Note that there are other solutions for this (for example the program can pause the calculation every half-a-second to check whether you clicked on the Cancel button or not), but they do not offer the same level of responsiveness (the GUI might seem to freeze for a few seconds while a file is being read or a calculation being done).

All the answers so far are focusing on the fact that multi-threading or multi-processing are necessary to make the best use of modern hardware.
There is however also the fact that multithreading can make life much easier for the programmer. At work I program software to control manufacturing and testing equipment, where a single machine often consists of several positions that work in parallel. Using multiple threads for that kind of software is a natural fit, as the parallel threads model the physical reality quite well. The threads do mostly not need to exchange any data, so the need to synchronize threads is rare, and many of the reasons for multithreading being difficult do therefore not apply.
Edit:
This is not really about a performance improvement, as the (maybe 5, maybe 10) threads are all mostly sleeping. It is however a huge improvement for the program structure when the various parallel processes can be coded as sequences of actions that do not know of each other. I have very bad memories from the times of 16 bit Windows, when I would create a state machine for each machine position, make sure that nothing would take longer than a few milliseconds, and constantly pass the control to the next state machine. When there were hardware events that needed to be serviced on time, and also computations that took a while (like FFT), then things would get ugly real fast.

Not directly answering your question, I believe in the very near future, almost every application will need to be multithreaded. The CPU performance is not growing that fast these days, which is compensated for by the increasing number of cores. Thus, if we will want our applications to stay on the top performance-wise, we'll need to find ways to utilize all your computer's CPUs and keep them busy, which is quite a hard job.
This can be done via telling your programs what to do instead of telling them exactly how. Now, this is a topic I personally find very interesting recently. Some functional languages, like F#, are able to parallelize many tasks quite easily. Well, not THAT easily, but still without the necessary infrastructure needed in more procedural-style environments.
Please take this as additional information to think about, not an attempt to answer your question.

The kind of applications that need to be threaded are the ones where you want to do more than one thing at once. Other than that no application needs to be multi-threaded.

Applications with a large workload which can be easily made parallel. The difficulty of taking your application and doing that should not be underestimated. It is easy when your data you're manipulating is not dependent upon other data but v. hard to schedule the cross thread work when there is a dependency.
Some examples I've done which are good multithreaded candidates..
running scenarios (eg stock derivative pricing, statistics)
bulk updating data files (eg adding a value / entry to 10,000 records)
other mathematical processes

E.g., you want your programs to be multithreaded when you want to utilize multiple cores and/or CPUs, even when the programs don't necessarily do many things at the same time.
EDIT: using multiple processes is the same thing. Which technique to use depends on the platform and how you are going to do communications within your program, etc.

Although frivolous, games, in general are becomming more and more threaded every year. At work our game uses around 10 threads doing physics, AI, animation, redering, network and IO.

Just want to add that caution must be taken with treads if your sharing any resources as this can lead to some very strange behavior, and your code not working correctly or even the threads locking each other out.
mutex will help you there as you can use mutex locks for protected code regions, a example of protected code regions would be reading or writing to shared memory between threads.
just my 2 cents worth.

The main purpose of multithreading is to separate time domains. So the uses are everywhere where you want several things to happen in their own distinctly separate time domains.

HERE IS A PERFECT USE CASE
If you like affiliate marketing multi-threading is essential. Kick the entire process off via a multi-threaded application.
Download merchant files via FTP, unzipping the files, enumerating through each file performing cleanup like EOL terminators from Unix to PC CRLF then slam each into SQL Server via Bulk Inserts then when all threads are complete create the full text search indexes for a environmental instance to be live tomorrow and your done. All automated to kick off at say 11:00 pm.
BOOM! Fast as lightening. Heck you have so much time left you can even download merchant images locally for the products you download, save the images as webp and set the product urls to use local images.
Yep I did it. Wrote it in C#. Works like a charm. Purchase a AMD Ryzen Threadripper 64-core with 256gb memory and fast drives like nvme, get lunch come back and see it all done or just stay around and watch all cores peg to 95%+, listen to the pc's fans kick, warm up the room and the look outside as the neighbors lights flicker from the power drain as you get shit done.
Future would be to push processing to GPU's as well.
Ok well I am pushing it a little bit with the neighbors lights flickering but all else was absolutely true. :)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string