Node.js web scraping optimization [closed]

Node.js web scraping optimization [closed] - node.js

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Tasks:
Scrape html from a webpage
Parse the html
Clean the data (remove white space, perform basic regex)
Persist the data to a SQL database.
Goal is to complete these 4 tasks as quickly as possible and here are some possible example approaches.
Possible Sample Approaches
Multi-Step 1: Scrape all pages and store html as .txt files. After all html is stored as text, run a separate module that parses/cleans/persists the data.
Multi-step 2: Scrape/Parse/Clean data and store in .txt files. Run a separate module to insert the data into a database.
Single-Step: Scrape/Parse/Clean/Persist data all in one step.
Assumptions:
1 dedicated server being used for scraping
disk space is unlimited
internet connection is your average home connection
memory (8GB)
No rate limiting on any web pages
User wants to scrape 1 million pages
I haven't done enough testing with node.js to establish a best practice but any insight on optimizing these tasks would be greatly appreciated.
Obviously, there are some unanswered questions (how much html is on a typical page, how much are your parsing, request/response latency, what frameworks are being used to parse data...etc), but a high level best practice/key considerations would be beneficial. Thanks.

With a problem like this, you can forsee only certain aspects of what will really control where your bottlenecks will be. So, you start with a smart, but not complicated implementation and you spend a fair amount of time figuring out how you can measure your performance and where the bottlenecks are.
Then, based on the knowledge of where the bottlenecks are, you come up with a proposed design change, implement that change and see how much of a difference you made in your overall throughput. You then instrument again, measure again and see where your new bottleneck is, come up with a new theory on how to beat that bottleneck, implement, measure, theorize, iterate, etc...
You really don't want to overdesign or overcomplicate the first implementation because it's very easy to be wrong about where you think the real bottleneck will be.
So, I'd probably start out with a design like this:
Create one node.js process that doesn't absolutely nothing but download pages and write them to disk. Use nothing by async I/O everywhere and make it configurable for how many simultaneous page downloads it has in-flight at once. Do no parsing, just write the raw data to disk. You will want to find some very fast way of storing which URL is which file. That could be something as simple as appending info to a text file or it could be a database write, but the idea is you just want it to be fast.
Then, create another node.js process that repeatedly grabs files from disk, parses them, cleans the data and persists the data to your SQL database.
Run the first node.js process by itself and let it run until it collects either 1,000 web pages or for 15 minutes (whichever comes first) to measure how much throughput you're initially capable of. While it's running, note the CPU utilization and the network utilization on your computer. If you're already in the ballpark of what you might need for this first node.js process, then you're done with the first node.js process. If you want it go much faster, then you need to figure out where your bottleneck is. If you're CPU-bound (unlikely for this I/O task), then you can cluster and run multiple of these node.js processes, giving each one a set of URLs to fetch and a separate place to write their collected data. More than likely you're I/O bound. This may be either because you aren't fully saturating your existing network connection (the node.js process spends too much time waiting for I/O) or you have already saturated your network connection and it is now the bottleneck. You will have to figure out which of these it is. If you add more simultaneous web page fetches and the performance does not increase or even goes down, then you've probably already saturated your web connection. You will also have to watch out for saturation the file I/O sub-system in node.js which uses a limit thread pool to implement async I/O.
For the second node.js process, you follow a similar process. Give it 1,000 web pages and see how fast it can process them all. Since you do have I/O to read the files form disk and to write to the database, you will want to have more than one page parsing at a time so you can maximize usage of the CPU when one page is being read in or written out. You can either write one node.js process to handle multiple parse projects at once or you can cluster a single node.js process. If you have multiple CPUs in your server, then you will want to have at least as many process as you have CPUs probably. Unlike the URL fetcher process, the code for parsing is likely something that could be seriously optimized to be faster. But, like other performance issues, don't try to overly optimize that code until you know you are CPU bound and it is holding you up.
Then, if your SQL database can be on another box or at least using another disk, that's probably a good thing because it separates out the disk writes there from your other disk writes.
Where you go after the first couple steps will depend entirely upon what you learn from the first few steps. Your ability to measure where the bottlenecks are and design quick experiments to test bottleneck theories will be hugely important for making rapid progress and not wasting development time on the wrong optimizations.
FYI, some home internet connection ISPs may set off some alarms with the amount and rate of your data requests. What they will do with that info likely varies a lot from one ISP to the next. I would think that most ultimately have some ability to rate limit your connection to protect the quality of service for others sharing your same pipe, but I don't know when/if they would do that.
This sounds like a really fun project to try to optimize and get the most out of. It would make a great final project for a medium to advanced software class.

Related

What should nodejs NOT be doing?

I'm now a couple of weeks into my node deep dive. I've learned a lot from Anthony's excellent course on udemy and I'm currently going through a book " nodejs the right way". I've also gone through quite a few articles that brought up some very good points about real world scenarios with node and coupling other technologies out there.
HOWEVER, it seems to be accepted as law, that you don't perform computationally heavy tasks with Node as its a single thread architecture. I get the idea of the event loop and asynch callbacks etc. In fact nodes strength stems from tons of concurrent IO connections if I understand correctly. No matter where I'm reading though, the source warns against hanging up that thread executing a task. I can't seem to find any rule of thumb of things to avoid using nodes process for. I've seen a solution saying that node should pass computationally heavy tasks to a message service like RabbitMQ which a dedicated app server can churn through(any suggestions on what pairs well with node for this task? I read something about an N-tier architecture). The reason I'm so confused is because I see node being used for reading and writing files to highlight the usage of streams but in my mind fetching/reading/writing files is an expensive task (I feel mistaken).
Tl;Dr What kind of tasks should node pass off to a work horse server ? What material can I read that explains the paradigm in detail?
Edit: it seems like my lack of understanding stemmed from not knowing what would even halt a thread in the first place outside of an obviously synchronous IO request . So if I understand correctly reading and writing data is IO where mutating said data or doing mathematical computations is computationally expensive (at varying levels depending on the task of course ) . Thanks for all the answers!

If you're using node.js as a server, then running a long running synchronous computational task ties up the one thread and during that computation, your server is non-responsive to other requests. That's generally a bad situation for servers.
So, the general design principles for node.js server design is this:
Use only asynchronous I/O functions. For example, use fs.readFile(), not fs.readyFileSync().
If you have a computationally intense operation, then move it to a child process. If you do a lot of these, then have several child processes that can process these long running operations. This keeps your main thread free so it can be responsive to I/O requests.
If you want to increase the overall scalability and responsiveness of your server, you can implement clustering with a server process per CPU. This isn't really a substitute for #2 above, but can also improve scalability and responsiveness.
The reason I'm so confused is because I see node being used for
reading and writing files to highlight the usage of streams but in my
mind fetching/reading/writing files is an expensive task (I feel
mistaken).
If you use the asynchronous versions of the I/O functions, then read/writing from the disk does not block the main JS thread as they present an asynchronous interface and the main thread can do other things while the system is fetching data from the disk.
What kind of tasks should node pass off to a work horse server ?
It depends a bit on the server load that you are trying to support, what you're asking it to do and your tolerance for responsiveness delays. The higher the load you're aiming for, then the more you need to get any computationally intensive task off the main JS thread and into some other process. At a medium number of long running transactions and a modest server load, you may just be able to use clustering to reach your scalability and responsiveness goal, but at some threshold of either length of the transaction or the load you're trying to support, you have to get the computationally intensive stuff out of the main JS thread.

HOWEVER, it seems to be accepted as law, that you don't perform computationally heavy tasks with Node as its a single thread architecture.
I would reword this:
don't perform computationally heavy tasks unless you need to with Node
Sometimes, you need to crunch through a bunch of data. There are times when it's faster or better to do that in-process than it is to pass it around.
A practical example:
I have a Node.js server that reads in raw log data from a bunch of servers. No standard logging utilities could be used as I have some custom processing being done, as well as custom authentication schemes for getting the log data. The whole thing is HTTP requests, and then parsing and re-writing the data.
As you can imagine, this uses a ton of CPU. Here's the thing though... is that CPU wasted? Am I doing anything in JS that I could do faster had I written it in another language? Often times the CPU is busy for a real reason, and the benefit of switching to something more native might be marginal. And then, you have to factor in the overhead of switching.
Remember that with Node.js, you can compile native extensions, so it's possible to have the best of both worlds in a well established framework.
For me, the human trade-offs came in. I'm a far more efficient Node.js developer than anything that runs natively. Even if my Node.js app were prove to be 5x slower than something native (which I'd imagine would be on the extreme), I could just buy 5 more servers to run, at much less cost than it would take for me to develop and maintain the native solution.
Use what you need. If you need to burn a lot of CPU in Node.js, just make sure you're doing it as efficiently as you can be. If you find that you could optimize something with native code, consider making an extension and besure to measure the performance differences afterwards. If you feel the desire to throw out the whole stack... reconsider your approach, as there might be something you're not considering.

Reading and writing files are I/O operations, so they are NOT CPU intensive. You can do a fair amount of concurrent I/O with Node without tying up any single request (in a Node HTTP server for example).
But people use Node in general for CPU-intensive tasks all the time and its fine. You just have to realize that if it uses all of the CPU for any significant amount of time then you will block all other requests to that server, which generally won't be acceptable if you need the server to stay available. But there are lots of times when your Node process is not trying to be a server firing back responses to many requests, such as when you have a Node program that just processes data and isn't a server at all.
Also, using another process is not the only way to do background tasks in Node. There is also webworker-threads which allows you to use threads if that is more convenient (you do have to copy the data in and out).
I would stop reading and do some targeted experiments so you can see what they are talking about. Try to create and test three different programs: 1) HTTP server handles lots of requests, always returns immediately with file contents 2) HTTP server handles lots of requests, but a certain request causes a large math computation that takes many seconds to return (which will block all the other requests -- big problem) 3) a Node program that is not an HTTP server, which does that large math computation and spits out the result in the terminal (which even though it takes awhile to work, is not handling other requests since its not a server, so its fine for it to block).

Performance implications of using inter-process communication (IPC)

What type of usage is IPC intended for and is it is OK to send larger chunks of JSON (hundreds of characters) between processes using IPC? Should I be trying to send as tiny as message as possible using IPC or would the performance gains coming from reducing message size not be worth the effort?

What type of usage is IPC intended for and is it is OK to send larger chunks of JSON (hundreds of characters) between processes using IPC?
At it's core, IPC is what it says on the tin. It's a tool to use when you need to communicate information between processes, whatever that may be. The topic is very broad, and technically includes allocating shared memory and doing the communication manually, but given the tone of the question, and the tags, I'm assuming you're talking about the OS provided facilities.
Wikipedia does a pretty good job discussing how IPC is used, and I don't think I can do much better, so I'll concentrate on the second question.
Should I be trying to send as tiny as message as possible using IPC or would the performance gains coming from reducing message size not be worth the effort?
This smells a bit like a micro-optimization. I can't say definitively, because I'm not privy to the source code at Microsoft and Apple, and I really don't want to dig through the Linux kernel's implementation of IPC, but, here's a couple points:
IPC is a common operation, so OS designers are likely to optimize it for efficiency. There are teams of engineers that have considered the problem and figured out how to make this fast.
The bottleneck in communication across processes/threads is almost always synchronization. Delays are bad, but race conditions and deadlocks are worse. There are, however, lots of creative ways that OS designers can speed up the procedure, since the system controls the process scheduler and memory manager.
There's lots of ways to make the data transfer itself fast. For the OS, if the data needs to cross process boundaries, then there is some copying that may need to take place, but the OS copies memory all over the place all the time. Think about a command line utility, like netstat. When that executable is run, memory needs to be allocated, the process needs to be loaded from disk, and any address fixing that the OS needs to do is done, before the process can even start. This is done so quickly that you hardly even notice. On Windows netstat is about 40k, and it loads into memory almost instantly. (Notepad, another fast loader is 10 times that size, but it still launches in a tiny amount of time.)
The big exception to #2 above is if you're talking about IPC between processes that aren't on the same computer. (Think Windows RPC) Then you're really bound by the speed of the networking/communication stack, but at that point a few kb here or there isn't going to make a whole lot of difference. (You could consider AJAX to be a form of IPC where the 'processes' are the server and your browser. Now consider how fast Google Docs operates.)
If the IPC is between processes on the same system, I don't think that it's worth a ton of effort shaving bytes from your message. Make your message easy to debug.
In the case that the communication is happening between processes on different machines, then you may have something to think about, having spent a lot of time debugging issues that would have been simple with a better data format, a few dozen extra milliseconds transit time isn't worth making the data harder to parse/debug. Remember the three rules of optimization1:
Don't.
Don't... yet. (For experts)
Profile before you do.
1 The first two rules are usually attributed to Michael Jackson. (This one not this one)

Why is Node.js single threaded? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
In PHP (or Java/ASP.NET/Ruby) based webservers every client request is instantiated on a new thread. But in Node.js all the clients run on the same thread (they can even share the same variables!) I understand that I/O operations are event-based so they don't block the main thread loop.
What I don't understand is WHY the author of Node chose it to be single-threaded? It makes things difficult. For example, I can't run a CPU intensive function because it blocks the main thread (and new client requests are blocked) so I need to spawn a process (which means I need to create a separate JavaScript file and execute another node process on it). However, in PHP cpu intensive tasks do not block other clients because as I mentioned each client is on a different thread. What are its advantages compared to multi-threaded web servers?
Note: I've used clustering to get around this, but it's not pretty.

Node.js was created explicitly as an experiment in async processing. The theory was that doing async processing on a single thread could provide more performance and scalability under typical web loads than the typical thread-based implementation.
And you know what? In my opinion that theory's been borne out. A node.js app that isn't doing CPU intensive stuff can run thousands more concurrent connections than Apache or IIS or other thread-based servers.
The single threaded, async nature does make things complicated. But do you honestly think it's more complicated than threading? One race condition can ruin your entire month! Or empty out your thread pool due to some setting somewhere and watch your response time slow to a crawl! Not to mention deadlocks, priority inversions, and all the other gyrations that go with multithreading.
In the end, I don't think it's universally better or worse; it's different, and sometimes it's better and sometimes it's not. Use the right tool for the job.

The issue with the "one thread per request" model for a server is that they don't scale well for several scenarios compared to the event loop thread model.
Typically, in I/O intensive scenarios the requests spend most of the time waiting for I/O to complete. During this time, in the "one thread per request" model, the resources linked to the thread (such as memory) are unused and memory is the limiting factor. In the event loop model, the loop thread selects the next event (I/O finished) to handle. So the thread is always busy (if you program it correctly of course).
The event loop model as all new things seems shiny and the solution for all issues but which model to use will depend on the scenario you need to tackle. If you have an intensive I/O scenario (like a proxy), the event base model will rule, whereas a CPU intensive scenario with a low number of concurrent processes will work best with the thread-based model.
In the real world most of the scenarios will be a bit in the middle. You will need to balance the real need for scalability with the development complexity to find the correct architecture (e.g. have an event base front-end that delegates to the backend for the CPU intensive tasks. The front end will use little resources waiting for the task result.) As with any distributed system it requires some effort to make it work.
If you are looking for the silver bullet that will fit with any scenario without any effort, you will end up with a bullet in your foot.

Long story short, node draws from V8, which is internally single-threaded. There are ways to work around the constraints for CPU-intensive tasks.
At one point (0.7) the authors tried to introduce isolates as a way of implementing multiple threads of computation, but were ultimately removed: https://groups.google.com/forum/#!msg/nodejs/zLzuo292hX0/F7gqfUiKi2sJ

Fastest Way of Storing Data [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have a server which generates some output, like this:
http://192.168.0.1/getJPG=[ID]
I have to go through ID 1 to 20M.
I see that most of delay is in storing file, currently I do store every request result as a separate file in a folder. in form of: [ID].jpg
Server responses quickly, generator server is really fast, but I can't handle received data rapidly.
What is best way of storing data for later processing?
I can do all type of storing, like in DB, like in SINGLE file and later parsing big file, etc.
I can code in .NET, PHP, C++, etc. No restrictions in programming language. Please advice.
Thanks

So you're downloading 20 million files from a server, and the speed at which you can save them to disk is a bottleneck? If you're accessing the server over the Internet, that's very strange. Perhaps you're downloading over a local network, or maybe the "server" is even running locally.
With 20 million files to save, I'm sure they won't all fit in RAM, so buffering the data in memory won't help. And if the maximum speed at which data can be written to your disk is really a bottleneck, using MS SQL or any other DB will not change anything. There's nothing "magic" about a DB -- it is limited by the performance of your disk, just like any other program.
It sounds like your best bet would be to use multiple disks. Download multiple files in parallel, and as each is received, write it out to a different disk, in a round-robin fashion. The more disks you have, the better. Use multiple threads OR non-blocking I/O so downloads and disk writes all happen concurrently.

To do this efficiently, I would multi-thread your application (c++).
The main thread of your application will make these web-requests and push them to the back of a std::list. This is all your main application thread will do.
Spawn (and keep it running, do not spawn repeatedly) a pthread(my preferred threading method, even on windows...) and set it up to check the same std::list in a while loop. In the loop, make sure to check the size of the list and if there are things to be processed, pop the front item off of the list (these can be done in different threads without needing a mutex... most of the time...) and write it to disk.
This will allow you to queue up the responses in memory and at the same time be asynchronously saving the files to disk. If your server really is as quick as you say it is, you might run out of memory. Then I would implement some 'waiting' if the number of items to be processed are over a certain threshold, but this will only run a little better than doing it serially.
The real way to 'improve' the speed of this is to have many worker threads (each with their own std::list and 'smart' pushing onto the list with the least items or one std::list shared with a mutex) processing the files. If you have a multi-core machine with multiple hard drives, this will greatly increase the speed of saving these files to disk.
The other solution is to off-load the saving of the files to many different computers as well (if the number of disks on your current computer is limiting the writes). By using a message passing system such as ZMQ/0MQ, you'd be able to very easily push off the saving of files to different systems (which are setup in a PULL fashion) with more hard drives being accessible than just what is currently on one machine. Using ZMQ makes the round-robin style message passing trivial as a fan-out architecture is built in and is literally minutes to implement.
Yet another solution is to create a ramdisk (easy done natively on linux, for windows... I've used this). This will allow you to parallelize the writing of the files with as many writers as you want without issue. Then you'd need to make sure to copy those files to a real storage location before you restart or you'd lose the files. But during the running, you'd be able to store the files in real-time without issue.

Probably it helps to access the disk sequentially. Here is a simple trick to do this: Stream all incoming files to an uncompressed ZIP file (there are libraries for that). This makes all IO sequential and there is only one file. You can also split off a new ZIP file after 10000 images or so to keep the individual ZIPs small.
You can later read all files by streaming out of the ZIP file. Little overhead there as it is uncompressed.

It sounds like you are trying to write an application which downloads as much content as you can as quickly as possible. You should be aware that when you do this, chances are people will notice as this will suck up a good amount of bandwidth and other resources.
Since this is Windows/NTFS, there are some things you need to keep in mind:
- Do not have more than 2k files in one folder.
- Use async/buffered writes as much as possible.
- Spread over as many disks as you have available for best I/O performance.
One thing that wasn't mentioned that is somewhat important is file size. Since it looks like you are fetching JPEGs, I'm going to assume an average files size of ~50k.
I've recently done something like this with an endless stream of ~1KB text files using .Net 4.0 and was able to saturate a 100mbit network controller on the local net. I used the TaskFactory to generate HttpWebRequest threads to download the data to memory streams. I buffered them in memory so I did not have to write them to disk. The basic approach I would recommend is similar - Spin off threads that each make the request, grab the response stream, and write it to disk. The hardest part will be generating the sequential folders and file names. You want to do this as quickly as possible, make it thread safe, and do your bookkeeping in memory to avoid hitting the disk with unnecessary calls for directory contents.
I would not worry about trying to sequence your writes. There are enough layers of the OS/NTFS that will try and do this for you. You should be saturating some piece of your pipe in no time.

What kinds of applications need to be multi-threaded?

What are some concrete examples of applications that need to be multi-threaded, or don't need to be, but are much better that way?
Answers would be best if in the form of one application per post that way the most applicable will float to the top.

There is no hard and fast answer, but most of the time you will not see any advantage for systems where the workflow/calculation is sequential. If however the problem can be broken down into tasks that can be run in parallel (or the problem itself is massively parallel [as some mathematics or analytical problems are]), you can see large improvements.
If your target hardware is single processor/core, you're unlikely to see any improvement with multi-threaded solutions (as there is only one thread at a time run anyway!)
Writing multi-threaded code is often harder as you may have to invest time in creating thread management logic.
Some examples
Image processing can often be done in parallel (e.g. split the image into 4 and do the work in 1/4 of the time) but it depends upon the algorithm being run to see if that makes sense.
Rendering of animation (from 3DMax,etc.) is massively parallel as each frame can be rendered independently to others -- meaning that 10's or 100's of computers can be chained together to help out.
GUI programming often helps to have at least two threads when doing something slow, e.g. processing large number of files - this allows the interface to remain responsive whilst the worker does the hard work (in C# the BackgroundWorker is an example of this)
GUI's are an interesting area as the "responsiveness" of the interface can be maintained without multi-threading if the worker algorithm keeps the main GUI "alive" by giving it time, in Windows API terms (before .NET, etc) this could be achieved by a primitive loop and no need for threading:
MSG msg;
while(GetMessage(&msg, hwnd, 0, 0))
{
TranslateMessage(&msg);
DispatchMessage(&msg);
// do some stuff here and then release, the loop will come back
// almost immediately (unless the user has quit)
}

Servers are typically multi-threaded (web servers, radius servers, email servers, any server): you usually want to be able to handle multiple requests simultaneously. If you do not want to wait for a request to end before you start to handle a new request, then you mainly have two options:
Run a process with multiple threads
Run multiple processes
Launching a process is usually more resource-intensive than lauching a thread (or picking one in a thread-pool), so servers are usually multi-threaded. Moreover, threads can communicate directly since they share the same memory space.
The problem with multiple threads is that they are usually harder to code right than multiple processes.

There are really three classes of reasons that multithreading would be applied:
Execution Concurrency to improve compute performance: If you have a problem that can be broken down into pieces and you also have more than one execution unit (processor core) available then dispatching the pieces into separate threads is the path to being able to simultaneously use two or more cores at once.
Concurrency of CPU and IO Operations: This is similar in thinking to the first one but in this case the objective is to keep the CPU busy AND also IO operations (ie: disk I/O) moving in parallel rather than alternating between them.
Program Design and Responsiveness: Many types of programs can take advantage of threading as a program design benefit to make the program more responsive to the user. For example the program can be interacting via the GUI and also doing something in the background.
Concrete Examples:
Microsoft Word: Edit document while the background grammar and spell checker works to add all the green and red squiggle underlines.
Microsoft Excel: Automatic background recalculations after cell edits
Web Browser: Dispatch multiple threads to load each of the several HTML references in parallel during a single page load. Speeds page loads and maximizes TCP/IP data throughput.

These days, the answer should be Any application that can be.
The speed of execution for a single thread pretty much peaked years ago - processors have been getting faster by adding cores, not by increasing clock speeds. There have been some architectural improvements that make better use of the available clock cycles, but really, the future is taking advantage of threading.
There is a ton of research going on into finding ways of parallelizing activities that we traditionally wouldn't think of parallelizing. Even something as simple as finding a substring within a string can be parallelized.

Basically there are two reasons to multi-thread:
To be able to do processing tasks in parallel. This only applies if you have multiple cores/processors, otherwise on a single core/processor computer you will slow the task down compared to the version without threads.
I/O whether that be networked I/O or file I/O. Normally if you call a blocking I/O call, the process has to wait for the call to complete. Since the processor/memory are several orders of magnitude quicker than a disk drive (and a network is even slower) it means the processor will be waiting a long time. The computer will be working on other things but your application will not be making any progress. However if you have multiple threads, the computer will schedule your application and the other threads can execute. One common use is a GUI application. Then while the application is doing I/O the GUI thread can keep refreshing the screen without looking like the app is frozen or not responding. Even on a single processor putting I/O in a different thread will tend to speed up the application.
The single threaded alternative to 2 is to use asynchronous calls where they return immediately and you keep controlling your program. Then you have to see when the I/O completes and manage using it. It is often simpler just to use a thread to do the I/O using the synchronous calls as they tend to be easier.
The reason to use threads instead of separate processes is because threads should be able to share data easier than multiple processes. And sometimes switching between threads is less expensive than switching between processes.
As another note, for #1 Python threads won't work because in Python only one python instruction can be executed at a time (known as the GIL or Global Interpreter Lock). I use that as an example but you need to check around your language. In python if you want to do parallel calculations, you need to do separate processes.

Many GUI frameworks are multi-threaded. This allows you to have a more responsive interface. For example, you can click on a "Cancel" button at any time while a long calculation is running.
Note that there are other solutions for this (for example the program can pause the calculation every half-a-second to check whether you clicked on the Cancel button or not), but they do not offer the same level of responsiveness (the GUI might seem to freeze for a few seconds while a file is being read or a calculation being done).

All the answers so far are focusing on the fact that multi-threading or multi-processing are necessary to make the best use of modern hardware.
There is however also the fact that multithreading can make life much easier for the programmer. At work I program software to control manufacturing and testing equipment, where a single machine often consists of several positions that work in parallel. Using multiple threads for that kind of software is a natural fit, as the parallel threads model the physical reality quite well. The threads do mostly not need to exchange any data, so the need to synchronize threads is rare, and many of the reasons for multithreading being difficult do therefore not apply.
Edit:
This is not really about a performance improvement, as the (maybe 5, maybe 10) threads are all mostly sleeping. It is however a huge improvement for the program structure when the various parallel processes can be coded as sequences of actions that do not know of each other. I have very bad memories from the times of 16 bit Windows, when I would create a state machine for each machine position, make sure that nothing would take longer than a few milliseconds, and constantly pass the control to the next state machine. When there were hardware events that needed to be serviced on time, and also computations that took a while (like FFT), then things would get ugly real fast.

Not directly answering your question, I believe in the very near future, almost every application will need to be multithreaded. The CPU performance is not growing that fast these days, which is compensated for by the increasing number of cores. Thus, if we will want our applications to stay on the top performance-wise, we'll need to find ways to utilize all your computer's CPUs and keep them busy, which is quite a hard job.
This can be done via telling your programs what to do instead of telling them exactly how. Now, this is a topic I personally find very interesting recently. Some functional languages, like F#, are able to parallelize many tasks quite easily. Well, not THAT easily, but still without the necessary infrastructure needed in more procedural-style environments.
Please take this as additional information to think about, not an attempt to answer your question.

The kind of applications that need to be threaded are the ones where you want to do more than one thing at once. Other than that no application needs to be multi-threaded.

Applications with a large workload which can be easily made parallel. The difficulty of taking your application and doing that should not be underestimated. It is easy when your data you're manipulating is not dependent upon other data but v. hard to schedule the cross thread work when there is a dependency.
Some examples I've done which are good multithreaded candidates..
running scenarios (eg stock derivative pricing, statistics)
bulk updating data files (eg adding a value / entry to 10,000 records)
other mathematical processes

E.g., you want your programs to be multithreaded when you want to utilize multiple cores and/or CPUs, even when the programs don't necessarily do many things at the same time.
EDIT: using multiple processes is the same thing. Which technique to use depends on the platform and how you are going to do communications within your program, etc.

Although frivolous, games, in general are becomming more and more threaded every year. At work our game uses around 10 threads doing physics, AI, animation, redering, network and IO.

Just want to add that caution must be taken with treads if your sharing any resources as this can lead to some very strange behavior, and your code not working correctly or even the threads locking each other out.
mutex will help you there as you can use mutex locks for protected code regions, a example of protected code regions would be reading or writing to shared memory between threads.
just my 2 cents worth.

The main purpose of multithreading is to separate time domains. So the uses are everywhere where you want several things to happen in their own distinctly separate time domains.

HERE IS A PERFECT USE CASE
If you like affiliate marketing multi-threading is essential. Kick the entire process off via a multi-threaded application.
Download merchant files via FTP, unzipping the files, enumerating through each file performing cleanup like EOL terminators from Unix to PC CRLF then slam each into SQL Server via Bulk Inserts then when all threads are complete create the full text search indexes for a environmental instance to be live tomorrow and your done. All automated to kick off at say 11:00 pm.
BOOM! Fast as lightening. Heck you have so much time left you can even download merchant images locally for the products you download, save the images as webp and set the product urls to use local images.
Yep I did it. Wrote it in C#. Works like a charm. Purchase a AMD Ryzen Threadripper 64-core with 256gb memory and fast drives like nvme, get lunch come back and see it all done or just stay around and watch all cores peg to 95%+, listen to the pc's fans kick, warm up the room and the look outside as the neighbors lights flicker from the power drain as you get shit done.
Future would be to push processing to GPU's as well.
Ok well I am pushing it a little bit with the neighbors lights flickering but all else was absolutely true. :)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string