How to articulate the difference between asynchronous and parallel programming?

How to articulate the difference between asynchronous and parallel programming? - multithreading

Many platforms promote asynchrony and parallelism as means for improving responsiveness. I understand the difference generally, but often find it difficult to articulate in my own mind, as well as for others.
I am a workaday programmer and use async & callbacks fairly often. Parallelism feels exotic.
But I feel like they are easily conflated, especially at the language design level. Would love a clear description of how they relate (or don't), and the classes of programs where each is best applied.

When you run something asynchronously it means it is non-blocking, you execute it without waiting for it to complete and carry on with other things. Parallelism means to run multiple things at the same time, in parallel. Parallelism works well when you can separate tasks into independent pieces of work.
Take for example rendering frames of a 3D animation. To render the animation takes a long time so if you were to launch that render from within your animation editing software you would make sure it was running asynchronously so it didn't lock up your UI and you could continue doing other things. Now, each frame of that animation can also be considered as an individual task. If we have multiple CPUs/Cores or multiple machines available, we can render multiple frames in parallel to speed up the overall workload.

I believe the main distinction is between concurrency and parallelism.
Async and Callbacks are generally a way (tool or mechanism) to express concurrency i.e. a set of entities possibly talking to each other and sharing resources.
In the case of async or callback communication is implicit while sharing of resources is optional (consider RMI where results are computed in a remote machine).
As correctly noted this is usually done with responsiveness in mind; to not wait for long latency events.
Parallel programming has usually throughput as the main objective while latency, i.e. the completion time for a single element, might be worse than a equivalent sequential program.
To better understand the distinction between concurrency and parallelism I am going to quote from Probabilistic models for concurrency of Daniele Varacca which is a good set of notes for theory of concurrency:
A model of computation is a model for concurrency when it is able to represent systems as composed of independent autonomous components, possibly communicating with each other. The notion of concurrency should not be confused with the notion of parallelism. Parallel computations usually involve a central control which distributes the work among several processors. In concurrency we stress the independence of the components, and the fact that they communicate with each other. Parallelism is like ancient Egypt, where the Pharaoh decides and the slaves work. Concurrency is like modern Italy, where everybody does what they want, and all use mobile phones.
In conclusion, parallel programming is somewhat a special case of concurrency where separate entities collaborate to obtain high performance and throughput (generally).
Async and Callbacks are just a mechanism that allows the programmer to express concurrency.
Consider that well-known parallel programming design patterns such as master/worker or map/reduce are implemented by frameworks that use such lower level mechanisms (async) to implement more complex centralized interactions.

This article explains it very well: http://urda.cc/blog/2010/10/04/asynchronous-versus-parallel-programming
It has this about asynchronous programming:
Asynchronous calls are used to prevent “blocking” within an application. [Such a] call will spin-off in an already existing thread (such as an I/O thread) and do its task when it can.
this about parallel programming:
In parallel programming you still break up work or tasks, but the key differences is that you spin up new threads for each chunk of work
and this in summary:
asynchronous calls will use threads already in use by the system and parallel programming requires the developer to break the work up, spinup, and teardown threads needed.

async: Do this by yourself somewhere else and notify me when you complete(callback). By the time i can continue to do my thing.
parallel: Hire as many guys(threads) as you wish and split the job to them to complete quicker and let me know(callback) when you complete. By the time i might continue to do my other stuff.
the main difference is parallelism mostly depends on hardware.

My basic understanding is:
Asynchonous programming solves the problem of waiting around for an expensive operation to complete before you can do anything else. If you can get other stuff done while you're waiting for the operation to complete then that's a good thing. Example: keeping a UI running while you go and retrieve more data from a web service.
Parallel programming is related but is more concerned with breaking a large task into smaller chunks that can be computed at the same time. The results of the smaller chunks can then be combined to produce the overall result. Example: ray-tracing where the colour of individual pixels is essentially independent.
It's probably more complicated than that, but I think that's the basic distinction.

I tend to think of the difference in these terms:
Asynchronous: Go away and do this task, when you're finished come back and tell me and bring the results. I'll be getting on with other things in the mean time.
Parallel: I want you to do this task. If it makes it easier, get some folks in to help. This is urgent though, so I'll wait here until you come back with the results. I can do nothing else until you come back.
Of course an asynchronous task might make use of parallelism, but the differentiation - to my mind at least - is whether you get on with other things while the operation is being carried out or if you stop everything completely until the results are in.

It is a question of order of execution.
If A is asynchronous with B, then I cannot predict beforehand when subparts of A will happen with respect to subparts of B.
If A is parallel with B, then things in A are happening at the same time as things in B. However, an order of execution may still be defined.
Perhaps the difficulty is that the word asynchronous is equivocal.
I execute an asynchronous task when I tell my butler to run to the store for more wine and cheese, and then forget about him and work on my novel until he knocks on the study door again. Parallelism is happening here, but the butler and I are engaged in fundamentally different tasks and of different social classes, so we don't apply that label here.
My team of maids is working in parallel when each of them is washing a different window.
My race car support team is asynchronously parallel in that each team works on a different tire and they don't need to communicate with each other or manage shared resources while they do their job.
My football (aka soccer) team does parallel work as each player independently processes information about the field and moves about on it, but they are not fully asynchronous because they must communicate and respond to the communication of others.
My marching band is also parallel as each player reads music and controls their instrument, but they are highly synchronous: they play and march in time to each other.
A cammed gatling gun could be considered parallel, but everything is 100% synchronous, so it is as though one process is moving forward.

Why Asynchronous ?
With today's application's growing more and more connected and also potentially
long running tasks or blocking operations such as Network I/O or Database Operations.So it's very important to hide the latency of these operations by starting them in background and returning back to the user interface quickly as possible. Here Asynchronous come in to the picture, Responsiveness.
Why parallel programming?
With today's data sets growing larger and computations growing more complex. So it's very important to reduce the execution time of these CPU-bound operations, in this case, by dividing the workload into chunks and then executing those chunks simultaneously. We can call this as "Parallel" .
Obviously it will give high Performance to our application.

Asynchronous
Let's say you are the point of contact for your client and you need to be responsive i.e. you need to share status, complexity of operation, resources required etc whenever asked. Now you have a time-consuming operation to be done and hence cannot take this up as you need to be responsive to the client 24/7. Hence, you delegate the time-consuming operation to someone else so that you can be responsive. This is asynchronous.
Parallel programming
Let's say you have a task to read, say, 100 lines from a text file, and reading one line takes 1 second. Hence, you'll require 100 seconds to read the text file. Now you're worried that the client must wait for 100 seconds for the operation to finish. Hence you create 9 more clones and make each of them read 10 lines from the text file. Now the time taken is only 10 seconds to read 100 lines. Hence you have better performance.
To sum up, asynchronous coding is done to achieve responsiveness and parallel programming is done for performance.

Asynchronous: Running a method or task in background, without blocking. May not necessorily run on a separate thread. Uses Context Switching / time scheduling.
Parallel Tasks: Each task runs parallally. Does not use context switching / time scheduling.

I came here fairly comfortable with the two concepts, but with something not clear to me about them.
After reading through some of the answers, I think I have a correct and helpful metaphor to describe the difference.
If you think of your individual lines of code as separate but ordered playing cards (stop me if I am explaining how old-school punch cards work), then for each separate procedure written, you will have a unique stack of cards (don't copy & paste!) and the difference between what normally goes on when run code normally and asynchronously depends on whether you care or not.
When you run the code, you hand the OS a set of single operations (that your compiler or interpreter broke your "higher" level code into) to be passed to the processor. With one processor, only one line of code can be executed at any one time. So, in order to accomplish the illusion of running multiple processes at the same time, the OS uses a technique in which it sends the processor only a few lines from a given process at a time, switching between all the processes according to how it sees fit. The result is multiple processes showing progress to the end user at what seems to be the same time.
For our metaphor, the relationship is that the OS always shuffles the cards before sending them to the processor. If your stack of cards doesn't depend on another stack, you don't notice that your stack stopped getting selected from while another stack became active. So if you don't care, it doesn't matter.
However, if you do care (e.g., there are multiple processes - or stacks of cards - that do depend on each other), then the OS's shuffling will screw up your results.
Writing asynchronous code requires handling the dependencies between the order of execution regardless of what that ordering ends up being. This is why constructs like "call-backs" are used. They say to the processor, "the next thing to do is tell the other stack what we did". By using such tools, you can be assured that the other stack gets notified before it allows the OS to run any more of its instructions. ("If called_back == false: send(no_operation)" - not sure if this is actually how it is implemented, but logically, I think it is consistent.)
For parallel processes, the difference is that you have two stacks that don't care about each other and two workers to process them. At the end of the day, you may need to combine the results from the two stacks, which would then be a matter of synchronicity but, for execution, you don't care again.
Not sure if this helps but, I always find multiple explanations helpful. Also, note that asynchronous execution is not constrained to an individual computer and its processors. Generally speaking, it deals with time, or (even more generally speaking) an order of events. So if you send dependent stack A to network node X and its coupled stack B to Y, the correct asynchronous code should be able to account for the situation as if it was running locally on your laptop.

Generally, there are only two ways you can do more than one thing each time. One is asynchronous, the other is parallel.
From the high level, like the popular server NGINX and famous Python library Tornado, they both fully utilize asynchronous paradigm which is Single thread server could simultaneously serve thousands of clients (some IOloop and callback). Using ECF(exception control follow) which could implement the asynchronous programming paradigm. so asynchronous sometimes doesn't really do thing simultaneous, but some io bound work, asynchronous could really promotes the performance.
The parallel paradigm always refers multi-threading, and multiprocessing. This can fully utilize multi-core processors, do things really simultaneously.

Summary of all above answers
parallel computing:
▪ solves throughput issue.
Concerned with breaking a large task into smaller chunks
▪ is machine related (multi machine/core/cpu/processor needed), eg: master slave, map reduce.
Parallel computations usually involve a central control which distributes the work among several processors
asynchronous:
▪ solves latency issue
ie, the problem of 'waiting around' for an expensive operation to complete before you can do anything else
▪ is thread related (multi thread needed)
Threading (using Thread, Runnable, Executor) is one fundamental way to perform asynchronous operations in Java

Related

Does it make sense to write concurrent program if you have 1 hardware thread? [duplicate]

What is the difference between concurrency and parallelism?

Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant. For example, multitasking on a single-core machine.
Parallelism is when tasks literally run at the same time, e.g., on a multicore processor.
Quoting Sun's Multithreaded Programming Guide:
Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.
Parallelism: A condition that arises when at least two threads are executing simultaneously.

Why the Confusion Exists
Confusion exists because dictionary meanings of both these words are almost the same:
Concurrent: existing, happening, or done at the same time(dictionary.com)
Parallel: very similar and often happening at the same time(merriam webster).
Yet the way they are used in computer science and programming are quite different. Here is my interpretation:
Concurrency: Interruptability
Parallelism: Independentability
So what do I mean by above definitions?
I will clarify with a real world analogy. Let’s say you have to get done 2 very important tasks in one day:
Get a passport
Get a presentation done
Now, the problem is that task-1 requires you to go to an extremely bureaucratic government office that makes you wait for 4 hours in a line to get your passport. Meanwhile, task-2 is required by your office, and it is a critical task. Both must be finished on a specific day.
Case 1: Sequential Execution
Ordinarily, you will drive to passport office for 2 hours, wait in the line for 4 hours, get the task done, drive back two hours, go home, stay awake 5 more hours and get presentation done.
Case 2: Concurrent Execution
But you’re smart. You plan ahead. You carry a laptop with you, and while waiting in the line, you start working on your presentation. This way, once you get back at home, you just need to work 1 extra hour instead of 5.
In this case, both tasks are done by you, just in pieces. You interrupted the passport task while waiting in the line and worked on presentation. When your number was called, you interrupted presentation task and switched to passport task. The saving in time was essentially possible due to interruptability of both the tasks.
Concurrency, IMO, can be understood as the "isolation" property in ACID. Two database transactions are considered isolated if sub-transactions can be performed in each and any interleaved way and the final result is same as if the two tasks were done sequentially. Remember, that for both the passport and presentation tasks, you are the sole executioner.
Case 3: Parallel Execution
Now, since you are such a smart fella, you’re obviously a higher-up, and you have got an assistant. So, before you leave to start the passport task, you call him and tell him to prepare first draft of the presentation. You spend your entire day and finish passport task, come back and see your mails, and you find the presentation draft. He has done a pretty solid job and with some edits in 2 more hours, you finalize it.
Now since, your assistant is just as smart as you, he was able to work on it independently, without needing to constantly ask you for clarifications. Thus, due to the independentability of the tasks, they were performed at the same time by two different executioners.
Still with me? Alright...
Case 4: Concurrent But Not Parallel
Remember your passport task, where you have to wait in the line?
Since it is your passport, your assistant cannot wait in line for you. Thus, the passport task has interruptability (you can stop it while waiting in the line, and resume it later when your number is called), but no independentability (your assistant cannot wait in your stead).
Case 5: Parallel But Not Concurrent
Suppose the government office has a security check to enter the premises. Here, you must remove all electronic devices and submit them to the officers, and they only return your devices after you complete your task.
In this, case, the passport task is neither independentable nor interruptible. Even if you are waiting in the line, you cannot work on something else because you do not have necessary equipment.
Similarly, say the presentation is so highly mathematical in nature that you require 100% concentration for at least 5 hours. You cannot do it while waiting in line for passport task, even if you have your laptop with you.
In this case, the presentation task is independentable (either you or your assistant can put in 5 hours of focused effort), but not interruptible.
Case 6: Concurrent and Parallel Execution
Now, say that in addition to assigning your assistant to the presentation, you also carry a laptop with you to passport task. While waiting in the line, you see that your assistant has created the first 10 slides in a shared deck. You send comments on his work with some corrections. Later, when you arrive back home, instead of 2 hours to finalize the draft, you just need 15 minutes.
This was possible because presentation task has independentability (either one of you can do it) and interruptability (you can stop it and resume it later). So you concurrently executed both tasks, and executed the presentation task in parallel.
Let’s say that, in addition to being overly bureaucratic, the government office is corrupt. Thus, you can show your identification, enter it, start waiting in line for your number to be called, bribe a guard and someone else to hold your position in the line, sneak out, come back before your number is called, and resume waiting yourself.
In this case, you can perform both the passport and presentation tasks concurrently and in parallel. You can sneak out, and your position is held by your assistant. Both of you can then work on the presentation, etc.
Back to Computer Science
In computing world, here are example scenarios typical of each of these cases:
Case 1: Interrupt processing.
Case 2: When there is only one processor, but all executing tasks have wait times due to I/O.
Case 3: Often seen when we are talking about map-reduce or hadoop clusters.
Case 4: I think Case 4 is rare. It’s uncommon for a task to be concurrent but not parallel. But it could happen. For example, suppose your task requires access to a special computational chip that can be accessed through only processor-1. Thus, even if processor-2 is free and processor-1 is performing some other task, the special computation task cannot proceed on processor-2.
Case 5: also rare, but not quite as rare as Case 4. A non-concurrent code can be a critical region protected by mutexes. Once it is started, it must execute to completion. However, two different critical regions can progress simultaneously on two different processors.
Case 6: IMO, most discussions about parallel or concurrent programming are basically talking about Case 6. This is a mix and match of both parallel and concurrent executions.
Concurrency and Go
If you see why Rob Pike is saying concurrency is better, you have to understand what the reason is. You have a really long task in which there are multiple waiting periods where you wait for some external operations like file read, network download. In his lecture, all he is saying is, “just break up this long sequential task so that you can do something useful while you wait.” That is why he talks about different organizations with various gophers.
Now the strength of Go comes from making this breaking really easy with go keyword and channels. Also, there is excellent underlying support in the runtime to schedule these goroutines.
But essentially, is concurrency better than parallelism?
Are apples better than oranges?

I like Rob Pike's talk: Concurrency is not Parallelism (it's better!)
(slides)
(talk)
Rob usually talks about Go and usually addresses the question of Concurrency vs Parallelism in a visual and intuitive explanation! Here is a short summary:
Task: Let's burn a pile of obsolete language manuals! One at a time!
Concurrency: There are many concurrently decompositions of the task! One example:
Parallelism: The previous configuration occurs in parallel if there are at least 2 gophers working at the same time or not.

To add onto what others have said:
Concurrency is like having a juggler juggle many balls. Regardless of how it seems, the juggler is only catching/throwing one ball per hand at a time. Parallelism is having multiple jugglers juggle balls simultaneously.

Say you have a program that has two threads. The program can run in two ways:
Concurrency Concurrency + parallelism
(Single-Core CPU) (Multi-Core CPU)
___ ___ ___
|th1| |th1|th2|
| | | |___|
|___|___ | |___
|th2| |___|th2|
___|___| ___|___|
|th1| |th1|
|___|___ | |___
|th2| | |th2|
In both cases we have concurrency from the mere fact that we have more than one thread running.
If we ran this program on a computer with a single CPU core, the OS would be switching between the two threads, allowing one thread to run at a time.
If we ran this program on a computer with a multi-core CPU then we would be able to run the two threads in parallel - side by side at the exact same time.

Concurrency: If two or more problems are solved by a single processor.
Parallelism: If one problem is solved by multiple processors.

Imagine learning a new programming language by watching a video tutorial. You need to pause the video, apply what been said in code then continue watching. That's concurrency.
Now you're a professional programmer. And you enjoy listening to calm music while coding. That's Parallelism.
As Andrew Gerrand said in GoLang Blog
Concurrency is about dealing with lots of things at once. Parallelism
is about doing lots of things at once.
Enjoy.

I will try to explain with an interesting and easy to understand example. :)
Assume that an organization organizes a chess tournament where 10 players (with equal chess playing skills) will challenge a professional champion chess player. And since chess is a 1:1 game thus organizers have to conduct 10 games in time efficient manner so that they can finish the whole event as quickly as possible.
Hopefully following scenarios will easily describe multiple ways of conducting these 10 games:
1) SERIAL - let's say that the professional plays with each person one by one i.e. starts and finishes the game with one person and then starts the next game with the next person and so on. In other words, they decided to conduct the games sequentially. So if one game takes 10 mins to complete then 10 games will take 100 mins, also assume that transition from one game to other takes 6 secs then for 10 games it will be 54 secs (approx. 1 min).
so the whole event will approximately complete in 101 mins (WORST APPROACH)
2) CONCURRENT - let's say that the professional plays his turn and moves on to the next player so all 10 players are playing simultaneously but the professional player is not with two person at a time, he plays his turn and moves on to the next person. Now assume a professional player takes 6 sec to play his turn and also transition time of a professional player b/w two players is 6 sec so the total transition time to get back to the first player will be 1min (10x6sec). Therefore, by the time he is back to the first person with whom the event was started, 2mins have passed (10xtime_per_turn_by_champion + 10xtransition_time=2mins)
Assuming that all player take 45sec to complete their turn so based on 10mins per game from SERIAL event the no. of rounds before a game finishes should 600/(45+6) = 11 rounds (approx)
So the whole event will approximately complete in 11xtime_per_turn_by_player_&_champion + 11xtransition_time_across_10_players = 11x51 + 11x60sec= 561 + 660 = 1221sec = 20.35mins (approximately)
SEE THE IMPROVEMENT from 101 mins to 20.35 mins (BETTER APPROACH)
3) PARALLEL - let's say organizers get some extra funds and thus decided to invite two professional champion players (both equally capable) and divided the set of same 10 players (challengers) into two groups of 5 each and assigned them to two champions i.e. one group each. Now the event is progressing in parallel in these two sets i.e. at least two players (one in each group) are playing against the two professional players in their respective group.
However within the group the professional player with take one player at a time (i.e. sequentially) so without any calculation you can easily deduce that whole event will approximately complete in 101/2=50.5mins to complete
SEE THE IMPROVEMENT from 101 mins to 50.5 mins (GOOD APPROACH)
4) CONCURRENT + PARALLEL - In the above scenario, let's say that the two champion players will play concurrently (read 2nd point) with the 5 players in their respective groups so now games across groups are running in parallel but within group, they are running concurrently.
So the games in one group will approximately complete in 11xtime_per_turn_by_player_&_champion + 11xtransition_time_across_5_players = 11x51 + 11x30 = 600 + 330 = 930sec = 15.5mins (approximately)
So the whole event (involving two such parallel running group) will approximately complete in 15.5mins
SEE THE IMPROVEMENT from 101 mins to 15.5 mins (BEST APPROACH)
NOTE: in the above scenario if you replace 10 players with 10 similar jobs and two professional players with two CPU cores then again the following ordering will remain true:
SERIAL > PARALLEL > CONCURRENT > CONCURRENT+PARALLEL
(NOTE: this order might change for other scenarios as this ordering highly depends on inter-dependency of jobs, communication needs between jobs and transition overhead between jobs)

Concurrent programming execution has 2 types : non-parallel concurrent programming and parallel concurrent programming (also known as parallelism).
The key difference is that to the human eye, threads in non-parallel concurrency appear to run at the same time but in reality they don't. In non - parallel concurrency threads rapidly switch and take turns to use the processor through time-slicing.
While in parallelism there are multiple processors available so, multiple threads can run on different processors at the same time.
Reference: Introduction to Concurrency in Programming Languages

Simple example:
Concurrent is: "Two queues accessing one ATM machine"
Parallel is: "Two queues and two ATM machines"

Parallelism is simultaneous execution of processes on a multiple cores per CPU or multiple CPUs (on a single motherboard).
Concurrency is when Parallelism is achieved on a single core/CPU by using scheduling algorithms that divides the CPU’s time (time-slice). Processes are interleaved.
Units:
1 or many cores in a single CPU (pretty much all modern day processors)
1 or many CPUs on a motherboard (think old school servers)
1 application is 1 program (think Chrome browser)
1 program can have 1 or many processes (think each Chrome browser tab is a process)
1 process can have 1 or many threads from 1 program (Chrome tab playing Youtube video in 1 thread, another thread spawned for comments
section, another for users login info)
Thus, 1 program can have 1 or many threads of execution
1 process is thread(s)+allocated memory resources by OS (heap, registers, stack, class memory)

They solve different problems. Concurrency solves the problem of having scarce CPU resources and many tasks. So, you create threads or independent paths of execution through code in order to share time on the scarce resource. Up until recently, concurrency has dominated the discussion because of CPU availability.
Parallelism solves the problem of finding enough tasks and appropriate tasks (ones that can be split apart correctly) and distributing them over plentiful CPU resources. Parallelism has always been around of course, but it's coming to the forefront because multi-core processors are so cheap.

concurency:
multiple execution flows with the potential to share resources
Ex:
two threads competing for a I/O port.
paralelism:
splitting a problem in multiple similar chunks.
Ex:
parsing a big file by running two processes on every half of the file.

Concurrency => When multiple tasks are performed in overlapping time periods with shared resources (potentially maximizing the resources utilization).
Parallel => when single task is divided into multiple simple independent sub-tasks which can be performed simultaneously.

Concurrency vs Parallelism
Rob Pike in 'Concurrency Is Not Parallelism'
Concurrency is about dealing with lots of things at once.
Parallelism is about doing lots of things at once.
[Concurrency theory]
Concurrency - handles several tasks at once
Parallelism - handles several thread at once
My vision of concurrency and parallelism
[Sync vs Async]
[Swift Concurrency]

If at all you want to explain this to a 9-year-old.

Think of it as servicing queues where server can only serve the 1st job in a queue.
1 server , 1 job queue (with 5 jobs) -> no concurrency, no parallelism (Only one job is being serviced to completion, the next job in the queue has to wait till the serviced job is done and there is no other server to service it)
1 server, 2 or more different queues (with 5 jobs per queue) -> concurrency (since server is sharing time with all the 1st jobs in queues, equally or weighted) , still no parallelism since at any instant, there is one and only job being serviced.
2 or more servers , one Queue -> parallelism ( 2 jobs done at the same instant) but no concurrency ( server is not sharing time, the 3rd job has to wait till one of the server completes.)
2 or more servers, 2 or more different queues -> concurrency and parallelism
In other words, concurrency is sharing time to complete a job, it MAY take up the same time to complete its job but at least it gets started early. Important thing is , jobs can be sliced into smaller jobs, which allows interleaving.
Parallelism is achieved with just more CPUs , servers, people etc that run in parallel.
Keep in mind, if the resources are shared, pure parallelism cannot be achieved, but this is where concurrency would have it's best practical use, taking up another job that doesn't need that resource.

I really like Paul Butcher's answer to this question (he's the writer of Seven Concurrency Models in Seven Weeks):
Although they’re often confused, parallelism and concurrency are
different things. Concurrency is an aspect of the problem domain—your
code needs to handle multiple simultaneous (or near simultaneous)
events. Parallelism, by contrast, is an aspect of the solution
domain—you want to make your program run faster by processing
different portions of the problem in parallel. Some approaches are
applicable to concurrency, some to parallelism, and some to both.
Understand which you’re faced with and choose the right tool for the
job.

In electronics serial and parallel represent a type of static topology, determining the actual behaviour of the circuit. When there is no concurrency, parallelism is deterministic.
In order to describe dynamic, time-related phenomena, we use the terms sequential and concurrent. For example, a certain outcome may be obtained via a certain sequence of tasks (eg. a recipe). When we are talking with someone, we are producing a sequence of words. However, in reality, many other processes occur in the same moment, and thus, concur to the actual result of a certain action. If a lot of people is talking at the same time, concurrent talks may interfere with our sequence, but the outcomes of this interference are not known in advance. Concurrency introduces indeterminacy.
The serial/parallel and sequential/concurrent characterization are orthogonal. An example of this is in digital communication. In a serial adapter, a digital message is temporally (i.e. sequentially) distributed along the same communication line (eg. one wire). In a parallel adapter, this is divided also on parallel communication lines (eg. many wires), and then reconstructed on the receiving end.
Let us image a game, with 9 children. If we dispose them as a chain, give a message at the first and receive it at the end, we would have a serial communication. More words compose the message, consisting in a sequence of communication unities.
I like ice-cream so much. > X > X > X > X > X > X > X > X > X > ....
This is a sequential process reproduced on a serial infrastructure.
Now, let us image to divide the children in groups of 3. We divide the phrase in three parts, give the first to the child of the line at our left, the second to the center line's child, etc.
I like ice-cream so much. > I like > X > X > X > .... > ....
> ice-cream > X > X > X > ....
> so much > X > X > X > ....
This is a sequential process reproduced on a parallel infrastructure (still partially serialized although).
In both cases, supposing there is a perfect communication between the children, the result is determined in advance.
If there are other persons that talk to the first child at the same time as you, then we will have concurrent processes. We do no know which process will be considered by the infrastructure, so the final outcome is non-determined in advance.

I'm going to offer an answer that conflicts a bit with some of the popular answers here. In my opinion, concurrency is a general term that includes parallelism. Concurrency applies to any situation where distinct tasks or units of work overlap in time. Parallelism applies more specifically to situations where distinct units of work are evaluated/executed at the same physical time. The raison d'etre of parallelism is speeding up software that can benefit from multiple physical compute resources. The other major concept that fits under concurrency is interactivity. Interactivity applies when the overlapping of tasks is observable from the outside world. The raison d'etre of interactivity is making software that is responsive to real-world entities like users, network peers, hardware peripherals, etc.
Parallelism and interactivity are almost entirely independent dimension of concurrency. For a particular project developers might care about either, both or neither. They tend to get conflated, not least because the abomination that is threads gives a reasonably convenient primitive to do both.
A little more detail about parallelism:
Parallelism exists at very small scales (e.g. instruction-level parallelism in processors), medium scales (e.g. multicore processors) and large scales (e.g. high-performance computing clusters). Pressure on software developers to expose more thread-level parallelism has increased in recent years, because of the growth of multicore processors. Parallelism is intimately connected to the notion of dependence. Dependences limit the extent to which parallelism can be achieved; two tasks cannot be executed in parallel if one depends on the other (Ignoring speculation).
There are lots of patterns and frameworks that programmers use to express parallelism: pipelines, task pools, aggregate operations on data structures ("parallel arrays").
A little more detail about interactivity:
The most basic and common way to do interactivity is with events (i.e. an event loop and handlers/callbacks). For simple tasks events are great. Trying to do more complex tasks with events gets into stack ripping (a.k.a. callback hell; a.k.a. control inversion). When you get fed up with events you can try more exotic things like generators, coroutines (a.k.a. Async/Await), or cooperative threads.
For the love of reliable software, please don't use threads if what you're going for is interactivity.
Curmudgeonliness
I dislike Rob Pike's "concurrency is not parallelism; it's better" slogan. Concurrency is neither better nor worse than parallelism. Concurrency includes interactivity which cannot be compared in a better/worse sort of way with parallelism. It's like saying "control flow is better than data".

From the book Linux System Programming by Robert Love:
Concurrency, Parallelism, and Races
Threads create two related but distinct phenomena: concurrency and
parallelism. Both are bittersweet, touching on the costs of threading
as well as its benefits. Concurrency is the ability of two or more
threads to execute in overlapping time periods. Parallelism is
the ability to execute two or more threads simultaneously.
Concurrency can occur without parallelism: for example, multitasking
on a single processor system. Parallelism (sometimes emphasized as
true parallelism) is a specific form of concurrency requiring multiple processors (or a single processor capable of multiple engines
of execution, such as a GPU). With concurrency, multiple threads make
forward progress, but not necessarily simultaneously. With
parallelism, threads literally execute in parallel, allowing
multithreaded programs to utilize multiple processors.
Concurrency is a programming pattern, a way of approaching problems.
Parallelism is a hardware feature, achievable through concurrency.
Both are useful.
This explanation is consistent with the accepted answer. Actually the concepts are far simpler than we think. Don't think them as magic. Concurrency is about a period of time, while Parallelism is about exactly at the same time, simultaneously.

Concurrency is the generalized form of parallelism. For example parallel program can also be called concurrent but reverse is not true.
Concurrent execution is possible on single processor (multiple threads, managed by scheduler or thread-pool)
Parallel execution is not possible on single processor but on multiple processors. (One process per processor)
Distributed computing is also a related topic and it can also be called concurrent computing but reverse is not true, like parallelism.
For details read this research paper
Concepts of Concurrent Programming

I really liked this graphical representation from another answer - I think it answers the question much better than a lot of the above answers
Parallelism vs Concurrency
When two threads are running in parallel, they are both running at the same time. For example, if we have two threads, A and B, then their parallel execution would look like this:
CPU 1: A ------------------------->
CPU 2: B ------------------------->
When two threads are running concurrently, their execution overlaps. Overlapping can happen in one of two ways: either the threads are executing at the same time (i.e. in parallel, as above), or their executions are being interleaved on the processor, like so:
CPU 1: A -----------> B ----------> A -----------> B ---------->
So, for our purposes, parallelism can be thought of as a special case of concurrency
Source: Another answer here
Hope that helps.

"Concurrency" is when there are multiple things in progress.
"Parallelism" is when concurrent things are progressing at the same time.
Examples of concurrency without parallelism:
Multiple threads on a single core.
Multiple messages in a Win32 message queue.
Multiple SqlDataReaders on a MARS connection.
Multiple JavaScript promises in a browser tab.
Note, however, that the difference between concurrency and parallelism is often a matter of perspective. The above examples are non-parallel from the perspective of (observable effects of) executing your code. But there is instruction-level parallelism even within a single core. There are pieces of hardware doing things in parallel with CPU and then interrupting the CPU when done. GPU could be drawing to screen while you window procedure or event handler is being executed. The DBMS could be traversing B-Trees for the next query while you are still fetching the results of the previous one. Browser could be doing layout or networking while your Promise.resolve() is being executed. Etc, etc...
So there you go. The world is as messy as always ;)

The simplest and most elegant way of understanding the two in my opinion is this. Concurrency allows interleaving of execution and so can give the illusion of parallelism. This means that a concurrent system can run your Youtube video alongside you writing up a document in Word, for example. The underlying OS, being a concurrent system, enables those tasks to interleave their execution. Because computers execute instructions so quickly, this gives the appearance of doing two things at once.
Parallelism is when such things really are in parallel. In the example above, you might find the video processing code is being executed on a single core, and the Word application is running on another. Note that this means that a concurrent program can also be in parallel! Structuring your application with threads and processes enables your program to exploit the underlying hardware and potentially be done in parallel.
Why not have everything be parallel then? One reason is because concurrency is a way of structuring programs and is a design decision to facilitate separation of concerns, whereas parallelism is often used in the name of performance. Another is that some things fundamentally cannot fully be done in parallel. An example of this would be adding two things to the back of a queue - you cannot insert both at the same time. Something must go first and the other behind it, or else you mess up the queue. Although we can interleave such execution (and so we get a concurrent queue), you cannot have it parallel.
Hope this helps!

"Concurrent" is doing things -- anything -- at the same time. They could be different things, or the same thing. Despite the accepted answer, which is lacking, it's not about "appearing to be at the same time." It's really at the same time. You need multiple CPU cores, either using shared memory within one host, or distributed memory on different hosts, to run concurrent code. Pipelines of 3 distinct tasks that are concurrently running at the same time are an example: Task-level-2 has to wait for units completed by task-level-1, and task-level-3 has to wait for units of work completed by task-level-2. Another example is concurrency of 1-producer with 1-consumer; or many-producers and 1-consumer; readers and writers; et al.
"Parallel" is doing the same things at the same time. It is concurrent, but furthermore it is the same behavior happening at the same time, and most typically on different data. Matrix algebra can often be parallelized, because you have the same operation running repeatedly: For example the column sums of a matrix can all be computed at the same time using the same behavior (sum) but on different columns. It is a common strategy to partition (split up) the columns among available processor cores, so that you have close to the same quantity of work (number of columns) being handled by each processor core. Another way to split up the work is bag-of-tasks where the workers who finish their work go back to a manager who hands out the work and get more work dynamically until everything is done. Ticketing algorithm is another.
Not just numerical code can be parallelized. Files too often can be processed in parallel. In a natural language processing application, for each of the millions of document files, you may need to count the number of tokens in the document. This is parallel, because you are counting tokens, which is the same behavior, for every file.
In other words, parallelism is when same behavior is being performed concurrently. Concurrently means at the same time, but not necessarily the same behavior. Parallel is a particular kind of concurrency where the same thing is happening at the same time.
Terms for example will include atomic instructions, critical sections, mutual exclusion, spin-waiting, semaphores, monitors, barriers, message-passing, map-reduce, heart-beat, ring, ticketing algorithms, threads, MPI, OpenMP.
Gregory Andrews' work is a top textbook on it: Multithreaded, Parallel, and Distributed Programming.

Concurrency can involve tasks run simultaneously or not (they can indeed be run in separate processors/cores but they can as well be run in "ticks"). What is important is that concurrency always refer to doing a piece of one greater task. So basically it's a part of some computations. You have to be smart about what you can do simultaneously and what not to and how to synchronize.
Parallelism means that you're just doing some things simultaneously. They don't need to be a part of solving one problem. Your threads can, for instance, solve a single problem each. Of course synchronization stuff also applies but from different perspective.

Parallelism:
Having multiple threads do similar task which are independent of each other in terms of data and resource that they require to do so. Eg: Google crawler can spawn thousands of threads and each thread can do it's task independently.
Concurrency:
Concurrency comes into picture when you have shared data, shared resource among the threads. In a transactional system this means you have to synchronize the critical section of the code using some techniques like Locks, semaphores, etc.

Explanation from this source was helpful for me:
Concurrency is related to how an application handles multiple tasks it
works on. An application may process one task at at time
(sequentially) or work on multiple tasks at the same time
(concurrently).
Parallelism on the other hand, is related to how an application
handles each individual task. An application may process the task
serially from start to end, or split the task up into subtasks which
can be completed in parallel.
As you can see, an application can be concurrent, but not parallel.
This means that it processes more than one task at the same time, but
the tasks are not broken down into subtasks.
An application can also be parallel but not concurrent. This means
that the application only works on one task at a time, and this task
is broken down into subtasks which can be processed in parallel.
Additionally, an application can be neither concurrent nor parallel.
This means that it works on only one task at a time, and the task is
never broken down into subtasks for parallel execution.
Finally, an application can also be both concurrent and parallel, in
that it both works on multiple tasks at the same time, and also breaks
each task down into subtasks for parallel execution. However, some of
the benefits of concurrency and parallelism may be lost in this
scenario, as the CPUs in the computer are already kept reasonably busy
with either concurrency or parallelism alone. Combining it may lead to
only a small performance gain or even performance loss.

Concurrent programming regards operations that appear to overlap and is primarily concerned with the complexity that arises due to non-deterministic control flow. The quantitative costs associated with concurrent programs are typically both throughput and latency. Concurrent programs are often IO bound but not always, e.g. concurrent garbage collectors are entirely on-CPU. The pedagogical example of a concurrent program is a web crawler. This program initiates requests for web pages and accepts the responses concurrently as the results of the downloads become available, accumulating a set of pages that have already been visited. Control flow is non-deterministic because the responses are not necessarily received in the same order each time the program is run. This characteristic can make it very hard to debug concurrent programs. Some applications are fundamentally concurrent, e.g. web servers must handle client connections concurrently. Erlang is perhaps the most promising upcoming language for highly concurrent programming.
Parallel programming concerns operations that are overlapped for the specific goal of improving throughput. The difficulties of concurrent programming are evaded by making control flow deterministic. Typically, programs spawn sets of child tasks that run in parallel and the parent task only continues once every subtask has finished. This makes parallel programs much easier to debug. The hard part of parallel programming is performance optimization with respect to issues such as granularity and communication. The latter is still an issue in the context of multicores because there is a considerable cost associated with transferring data from one cache to another. Dense matrix-matrix multiply is a pedagogical example of parallel programming and it can be solved efficiently by using Straasen's divide-and-conquer algorithm and attacking the sub-problems in parallel. Cilk is perhaps the most promising language for high-performance parallel programming on shared-memory computers (including multicores).
Copied from my answer: https://stackoverflow.com/a/3982782

Is the point of multi-threading to increase your CPU usage?

I am very new to the concept of multi-threading, but to my understanding the whole point of multi-threading architecture (I learnt there is hardware multi-threading and software multi-threading; not that I completely understand the concept of each but I think I am talking about the hardware aspect here) is to "keep your CPU busy". This is for example to process another task when your current one is fetching your hard-disk for data import.
If I am correct, for a non multi-threading CPU, if it already has near 100% usage, then switching to a multi-threading one would not help you much. Am I correct?
I am sure my statement of the problem is full of inaccuracy but I hope I made myself understood.

You have been given the task to build a house, your team is composed by you, the supervisor, and a pool of workers.
When your boss came by to check on the progresses, what would you like him to see? One worker doing all the job and the other watching, or all the workers being busy?
You want to keep the worker busy giving them independent tasks, the more worker the harder this is.
Furthermore there are some issues to take into account: worker A was given the task to building a wall, and it is building it. Before the wall gets too tall the concrete need to dry, so A spends a lot of time waiting.
During this wait they could help somewhere else.
Asking A to help somewhere else while they are in the building-the-wall step is pointless, they either need to decline or stop what they are doing.
One way or the other you won't get any benefit.
The workers are the equivalent of threads.
The house building is the equivalent of a multi-threaded process.
The A worker building the wall is the equivalent of CPU bounded process, a process that use a CPU as much as he need.
The concrete drying is the equivalent of an IO operation, it will complete itself without the need of any worker.
The A worker waiting for the concrete to dry is the equivalent of a IO bounded process, it mostly does nothing.
The A worker being always busy is the equivalent of an optimal scheduling/multi-threaded algorithm.
Now you are given the task to study a CS book, your team is composed by you, a student that cannot read, and a pool of readers.
How would you assign the readers? You can't make them read each chapter individually, as you can't listen to more than one person.
So even if you have a lot of worker you pick one and make them read the book sequentially.
Reading a book is an example of an inherently sequential problem that won't benefit from multi-threading, while building an house is an example of a very parallelizable problem.
Handling multiple workers is not easy: a worker A may starts a wall because they took a look at the concrete stash and realized there was enough of it but didn't claim it. Worker B needs some concrete and takes it from the stash, now A no longer has enough concrete.
This is the equivalent of a race condition: A is checking and using a resource in two different times (as two different, divisible, operations) and the result depends on the timing of the workers.
If you think of the CPU as "units that can do things" you will realize that having more "units that can do things" is better only if they are not standing there glazing at the infinity.
Hence all the literature about multi-threading.

I've originally misunderstood the question. Your statement is correct for software multi-threading when we parallelize some algorithm to make the computations in multiple threads. In this case if your CPU is already loaded with work you can't make it work faster by executing code in multiple threads. Moreover, you can even expect decrease in performance due to overhead of multi-threaded communications and context switches. But in modern world it is not easy to find a single core CPU (with exception of embedded applications). So in most cases you need to use threads to fully utilize CPU's computational ability.
But for hardware multi-threading situation is different because it is an absolutely different thing. CPU has circuits that perform arithmetic operations and circuits responsible for program flow. And now we make a trick: we double number of second circuits and make them share arithmetic operations circuits. Now two threads can execute different commands simultaneously: they can't do summation at the same time, but one can add numbers and second can divide something. And that's how performance is gained. So 100% load now is different load because you have enabled additional circuits in the CPU. Relative value is the same, but absolute performance is higher.

What are the tell-tale signs that my code needs to make use of multi-threading?

I am using a third party API which performs what I would assume are expensive operations in terms of time/resources used (image recognition, etc). What tell-tale signs are there that the code under test should be made to use threads to increase performance?
I have a profiler and will be profiling the code I write which will rely on this API.
Thanks

If you have two distinct sequences of events that don't depend on one-another, then consider it. If you have to write bunches of logic just to make sure that two operations aren't getting in each-others way, it pays off by making the two pieces of code clearer.
If on the other hand you find that, in attempting to make something multithreaded, you have to add gobs of code to communicate results between the threads, because one (or both) can't proceed without some information from the other, that's a good sign that you are trying to make threads where they don't make sense.
One case where it makes sense to go multi-threaded, even when you have to add communication to do it, is when you have one task that needs to stay available for input, and another to do heavy computing. One thread may poll for input from somewhere, blocking when none is available, so that when input is available it is responded to in a timely manner, and feed jobs to another 'worker' thread, so that processing continues at all times, not just when there's input.
One other thing to consider, is that even when a job is 'embarrassingly parallel' (i.e., requiring little or no communication between the parallelized parts), there are cases where multithreading may not be worthwhile. If your CPU can assign different threads to different cores, multithreading will give you a speed up, by allowing multiple cores to chew through the work simultaneously. But on a single core processor, or even a multi-core one with an unfortunate OS, having multiple threads will not speed things up, as the one core will still have to get through all the work.

Image processing is often cpu-bound. However, if your image-processing api already is designed to leverage multiple cpus, multi-threading probably won't help you. The strategy I usually consider for quickly determining if multi-threading will help is to write a simple program which does the relevant processing over and over again. Then, I will run it on a set of data, then run two instances of the process simultaneously,each on half of the data. There is no need to ensure the data is equalized for such a test; if one process runs out it will just run one instance for anything left. Timing is done via wall-clock time. I mean this literally; pick a large enough data set that it will take at least a full minute to run, but ideally 5 minutes or longer).
If running two copies at the same time improves throughput significantly, multi-threading is probably a good idea. Obviously this strategy is only practical in certain instances and in some cases multi-threading can involve leveraging shared output in ways this trick can't emulate. But, it's an absurdly easy test to run, and rarely requires much, if any, code to be written.

What kinds of applications need to be multi-threaded?

What are some concrete examples of applications that need to be multi-threaded, or don't need to be, but are much better that way?
Answers would be best if in the form of one application per post that way the most applicable will float to the top.

There is no hard and fast answer, but most of the time you will not see any advantage for systems where the workflow/calculation is sequential. If however the problem can be broken down into tasks that can be run in parallel (or the problem itself is massively parallel [as some mathematics or analytical problems are]), you can see large improvements.
If your target hardware is single processor/core, you're unlikely to see any improvement with multi-threaded solutions (as there is only one thread at a time run anyway!)
Writing multi-threaded code is often harder as you may have to invest time in creating thread management logic.
Some examples
Image processing can often be done in parallel (e.g. split the image into 4 and do the work in 1/4 of the time) but it depends upon the algorithm being run to see if that makes sense.
Rendering of animation (from 3DMax,etc.) is massively parallel as each frame can be rendered independently to others -- meaning that 10's or 100's of computers can be chained together to help out.
GUI programming often helps to have at least two threads when doing something slow, e.g. processing large number of files - this allows the interface to remain responsive whilst the worker does the hard work (in C# the BackgroundWorker is an example of this)
GUI's are an interesting area as the "responsiveness" of the interface can be maintained without multi-threading if the worker algorithm keeps the main GUI "alive" by giving it time, in Windows API terms (before .NET, etc) this could be achieved by a primitive loop and no need for threading:
MSG msg;
while(GetMessage(&msg, hwnd, 0, 0))
{
TranslateMessage(&msg);
DispatchMessage(&msg);
// do some stuff here and then release, the loop will come back
// almost immediately (unless the user has quit)
}

Servers are typically multi-threaded (web servers, radius servers, email servers, any server): you usually want to be able to handle multiple requests simultaneously. If you do not want to wait for a request to end before you start to handle a new request, then you mainly have two options:
Run a process with multiple threads
Run multiple processes
Launching a process is usually more resource-intensive than lauching a thread (or picking one in a thread-pool), so servers are usually multi-threaded. Moreover, threads can communicate directly since they share the same memory space.
The problem with multiple threads is that they are usually harder to code right than multiple processes.

There are really three classes of reasons that multithreading would be applied:
Execution Concurrency to improve compute performance: If you have a problem that can be broken down into pieces and you also have more than one execution unit (processor core) available then dispatching the pieces into separate threads is the path to being able to simultaneously use two or more cores at once.
Concurrency of CPU and IO Operations: This is similar in thinking to the first one but in this case the objective is to keep the CPU busy AND also IO operations (ie: disk I/O) moving in parallel rather than alternating between them.
Program Design and Responsiveness: Many types of programs can take advantage of threading as a program design benefit to make the program more responsive to the user. For example the program can be interacting via the GUI and also doing something in the background.
Concrete Examples:
Microsoft Word: Edit document while the background grammar and spell checker works to add all the green and red squiggle underlines.
Microsoft Excel: Automatic background recalculations after cell edits
Web Browser: Dispatch multiple threads to load each of the several HTML references in parallel during a single page load. Speeds page loads and maximizes TCP/IP data throughput.

These days, the answer should be Any application that can be.
The speed of execution for a single thread pretty much peaked years ago - processors have been getting faster by adding cores, not by increasing clock speeds. There have been some architectural improvements that make better use of the available clock cycles, but really, the future is taking advantage of threading.
There is a ton of research going on into finding ways of parallelizing activities that we traditionally wouldn't think of parallelizing. Even something as simple as finding a substring within a string can be parallelized.

Basically there are two reasons to multi-thread:
To be able to do processing tasks in parallel. This only applies if you have multiple cores/processors, otherwise on a single core/processor computer you will slow the task down compared to the version without threads.
I/O whether that be networked I/O or file I/O. Normally if you call a blocking I/O call, the process has to wait for the call to complete. Since the processor/memory are several orders of magnitude quicker than a disk drive (and a network is even slower) it means the processor will be waiting a long time. The computer will be working on other things but your application will not be making any progress. However if you have multiple threads, the computer will schedule your application and the other threads can execute. One common use is a GUI application. Then while the application is doing I/O the GUI thread can keep refreshing the screen without looking like the app is frozen or not responding. Even on a single processor putting I/O in a different thread will tend to speed up the application.
The single threaded alternative to 2 is to use asynchronous calls where they return immediately and you keep controlling your program. Then you have to see when the I/O completes and manage using it. It is often simpler just to use a thread to do the I/O using the synchronous calls as they tend to be easier.
The reason to use threads instead of separate processes is because threads should be able to share data easier than multiple processes. And sometimes switching between threads is less expensive than switching between processes.
As another note, for #1 Python threads won't work because in Python only one python instruction can be executed at a time (known as the GIL or Global Interpreter Lock). I use that as an example but you need to check around your language. In python if you want to do parallel calculations, you need to do separate processes.

Many GUI frameworks are multi-threaded. This allows you to have a more responsive interface. For example, you can click on a "Cancel" button at any time while a long calculation is running.
Note that there are other solutions for this (for example the program can pause the calculation every half-a-second to check whether you clicked on the Cancel button or not), but they do not offer the same level of responsiveness (the GUI might seem to freeze for a few seconds while a file is being read or a calculation being done).

All the answers so far are focusing on the fact that multi-threading or multi-processing are necessary to make the best use of modern hardware.
There is however also the fact that multithreading can make life much easier for the programmer. At work I program software to control manufacturing and testing equipment, where a single machine often consists of several positions that work in parallel. Using multiple threads for that kind of software is a natural fit, as the parallel threads model the physical reality quite well. The threads do mostly not need to exchange any data, so the need to synchronize threads is rare, and many of the reasons for multithreading being difficult do therefore not apply.
Edit:
This is not really about a performance improvement, as the (maybe 5, maybe 10) threads are all mostly sleeping. It is however a huge improvement for the program structure when the various parallel processes can be coded as sequences of actions that do not know of each other. I have very bad memories from the times of 16 bit Windows, when I would create a state machine for each machine position, make sure that nothing would take longer than a few milliseconds, and constantly pass the control to the next state machine. When there were hardware events that needed to be serviced on time, and also computations that took a while (like FFT), then things would get ugly real fast.

Not directly answering your question, I believe in the very near future, almost every application will need to be multithreaded. The CPU performance is not growing that fast these days, which is compensated for by the increasing number of cores. Thus, if we will want our applications to stay on the top performance-wise, we'll need to find ways to utilize all your computer's CPUs and keep them busy, which is quite a hard job.
This can be done via telling your programs what to do instead of telling them exactly how. Now, this is a topic I personally find very interesting recently. Some functional languages, like F#, are able to parallelize many tasks quite easily. Well, not THAT easily, but still without the necessary infrastructure needed in more procedural-style environments.
Please take this as additional information to think about, not an attempt to answer your question.

The kind of applications that need to be threaded are the ones where you want to do more than one thing at once. Other than that no application needs to be multi-threaded.

Applications with a large workload which can be easily made parallel. The difficulty of taking your application and doing that should not be underestimated. It is easy when your data you're manipulating is not dependent upon other data but v. hard to schedule the cross thread work when there is a dependency.
Some examples I've done which are good multithreaded candidates..
running scenarios (eg stock derivative pricing, statistics)
bulk updating data files (eg adding a value / entry to 10,000 records)
other mathematical processes

E.g., you want your programs to be multithreaded when you want to utilize multiple cores and/or CPUs, even when the programs don't necessarily do many things at the same time.
EDIT: using multiple processes is the same thing. Which technique to use depends on the platform and how you are going to do communications within your program, etc.

Although frivolous, games, in general are becomming more and more threaded every year. At work our game uses around 10 threads doing physics, AI, animation, redering, network and IO.

Just want to add that caution must be taken with treads if your sharing any resources as this can lead to some very strange behavior, and your code not working correctly or even the threads locking each other out.
mutex will help you there as you can use mutex locks for protected code regions, a example of protected code regions would be reading or writing to shared memory between threads.
just my 2 cents worth.

The main purpose of multithreading is to separate time domains. So the uses are everywhere where you want several things to happen in their own distinctly separate time domains.

HERE IS A PERFECT USE CASE
If you like affiliate marketing multi-threading is essential. Kick the entire process off via a multi-threaded application.
Download merchant files via FTP, unzipping the files, enumerating through each file performing cleanup like EOL terminators from Unix to PC CRLF then slam each into SQL Server via Bulk Inserts then when all threads are complete create the full text search indexes for a environmental instance to be live tomorrow and your done. All automated to kick off at say 11:00 pm.
BOOM! Fast as lightening. Heck you have so much time left you can even download merchant images locally for the products you download, save the images as webp and set the product urls to use local images.
Yep I did it. Wrote it in C#. Works like a charm. Purchase a AMD Ryzen Threadripper 64-core with 256gb memory and fast drives like nvme, get lunch come back and see it all done or just stay around and watch all cores peg to 95%+, listen to the pc's fans kick, warm up the room and the look outside as the neighbors lights flicker from the power drain as you get shit done.
Future would be to push processing to GPU's as well.
Ok well I am pushing it a little bit with the neighbors lights flickering but all else was absolutely true. :)

When is multi-threading not a good idea? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I was recently working on an application that sent and received messages over Ethernet and Serial. I was then tasked to add the monitoring of DIO discretes. I throught,
"No reason to interrupt the main
thread which is involved in message
processing, I'll just create
another thread that monitors DIO."
This decision, however, proved to be poor. Sometimes the main thread would be interrupted between a Send and a Receive serial message. This interruption would disrupt the timing and alas, messages would be lost (forever).
I found another way to monitor the DIO without using another thread and Ethernet and Serial communication were restored to their correct functionality.
The whole fiasco, however, got me thinking. Are their any general guidelines about when not to use multiple-threads and/or does anyone have anymore examples of situations when using multiple-threads is not a good idea?
**EDIT:Based on your comments and after scowering the internet for information, I have composed a blog post entitled When is multi-threading not a good idea?

On a single processor machine and a desktop application, you use multi threads so you don't freeze the app but for nothing else really.
On a single processor server and a web based app, no need for multi threading because the web server handles most of it.
On a multi-processor machine and desktop app, you are suggested to use multi threads and parallel programming. Make as many threads as there are processors.
On a multi-processor server and a web based app, no need again for multi threads because the web server handles it.
In total, if you use multiple threads for other than un-freezing desktop apps and any other generic answer, you will make the app slower if you have a single core machine due to the threads interrupting each other.
Why? Because of the hardware switches. It takes time for the hardware to switch between threads in total. On a multi-core box, go ahead and use 1 thread for each core and you will greatly see a ramp up.

To paraphrase an old quote: A programmer had a problem. He thought, "I know, I'll use threads." Now the programmer has two problems. (Often attributed to JWZ, but it seems to predate his use of it talking about regexes.)
A good rule of thumb is "Don't use threads, unless there's a very compelling reason to use threads." Multiple threads are asking for trouble. Try to find a good way to solve the problem without using multiple threads, and only fall back to using threads if avoiding it is as much trouble as the extra effort to use threads. Also, consider switching to multiple threads if you're running on a multi-core/multi-CPU machine, and performance testing of the single threaded version shows that you need the performance of the extra cores.

Multi-threading is a bad idea if:
Several threads access and update the same resource (set a variable, write to a file), and you don't understand thread safety.
Several threads interact with each other and you don't understand mutexes and similar thread-management tools.
Your program uses static variables (threads typically share them by default).
You haven't debugged concurrency issues.

Actually, multi threading is not scalable and is hard to debug, so it should not be used in any case if you can avoid it. There is few cases where it is mandatory : when performance on a multi CPU matters, or when you deal whith a server that have a lot of clients taking a long time to answer.
In any other cases, you can use alternatives such as queue + cron jobs or else.

You might want to take a look at the Dan Kegel's "The C10K problem" web page about handling multiple data sources/sinks.
Basically it is best to use minimal threads, which in sockets can be done in most OS's w/ some event system (or asynchronously in Windows using IOCP).
When you run into the case where the OS and/or libraries do not offer a way to perform communication in a non-blocking manner, it is best to use a thread-pool to handle them while reporting back to the same event loop.
Example diagram of layout:
Per CPU [*] EVENTLOOP ------ Handles nonblocking I/O using OS/library utilities
| \___ Threadpool for various blocking events
Threadpool for handling the I/O messages that would take long

Multithreading is bad except in the single case where it is good. This case is
The work is CPU Bound, or parts of it is CPU Bound
The work is parallelisable.
If either or both of these conditions are missing, multithreading is not going to be a winning strategy.
If the work is not CPU bound, then you are waiting not on threads to finish work, but rather for some external event, such as network activity, for the process to complete its work. Using threads, there is the additional cost of context switches between threads, The cost of synchronization (mutexes, etc), and the irregularity of thread preemption. The alternative in most common use is asynchronous IO, in which a single thread listens to several io ports, and acts on whichever happens to be ready now, one at a time. If by some chance these slow channels all happen to become ready at the same time, It might seem like you will experience a slow-down, but in practice this is rarely true. The cost of handling each port individually is often comparable or better than the cost of synchronizing state on multiple threads as each channel is emptied.
Many tasks may be compute bound, but still not practical to use a multithreaded approach because the process must synchronise on the entire state. Such a program cannot benefit from multithreading because no work can be performed concurrently. Fortunately, most programs that require enormous amounts of CPU can be parallelized to some level.

Multi-threading is not a good idea if you need to guarantee precise physical timing (like in your example). Other cons include intensive data exchange between threads. I would say multi-threading is good for really parallel tasks if you don't care much about their relative speed/priority/timing.

A recent application I wrote that had to use multithreading (although not unbounded number of threads) was one where I had to communicate in several directions over two protocols, plus monitoring a third resource for changes. Both protocol libraries required a thread to run the respective event loop in, and when those were accounted for, it was easy to create a third loop for the resource monitoring. In addition to the event loop requirements, the messages going through the wires had strict timing requirements, and one loop couldn't be risked blocking the other, something that was further alleviated by using a multicore CPU (SPARC).
There were further discussions on whether each message processing should be considered a job that was given to a thread from a thread pool, but in the end that was an extension that wasn't worth the work.
All-in-all, threads should if possible only be considered when you can partition the work into well defined jobs (or series of jobs) such that the semantics are relatively easy to document and implement, and you can put an upper bound on the number of threads you use and that need to interact. Systems where this is best applied are almost message passing systems.

In priciple everytime there is no overhead for the caller to wait in a queue.

A couple more possible reasons to use threads:
Your platform lacks asynchronous I/O operations, e.g. Windows ME (No completion ports or overlapped I/O, a pain when porting XP applications that use them.) Java 1.3 and earlier.
A third-party library function that can hang, e.g. if a remote server is down, and the library provides no way to cancel the operation and you can't modify it.
Keeping a GUI responsive during intensive processing doesn't always require additional threads. A single callback function is usually sufficient.
If none of the above apply and I still want parallelism for some reason, I prefer to launch an independent process if possible.

I would say multi-threading is generally used to:
Allow data processing in the background while a GUI remains responsive
Split very big data analysis onto multiple processing units so that you can get your results quicker.
When you're receiving data from some hardware and need something to continuously add it to a buffer while some other element decides what to do with it (write to disk, display on a GUI etc.).
So if you're not solving one of those issues, it's unlikely that adding threads will make your life easier. In fact it'll almost certainly make it harder because as others have mentioned; debugging mutithreaded applications is considerably more work than a single threaded solution.
Security might be a reason to avoid using multiple threads (over multiple processes). See Google chrome for an example of multi-process safety features.

Multi-threading is scalable, and will allow your UI to maintain its responsivness while doing very complicated things in the background. I don't understand where other responses are acquiring their information on multi-threading.
When you shouldn't multi-thread is a mis-leading question to your problem. Your problem is this: Why did multi-threading my application cause serial / ethernet communications to fail?
The answer to that question will depend on the implementation, which should be discussed in another question. I know for a fact that you can have both ethernet and serial communications happening in a multi-threaded application at the same time as numerous other tasks without causing any data loss.
The one reason to not use multi-threading is:
There is one task, and no user interface with which the task will interfere.
The reasons to use mutli-threading are:
Provides superior responsiveness to the user
Performs multiple tasks at the same time to decrease overall execution time
Uses more of the current multi-core CPUs, and multi-multi-cores of the future.
There are three basic methods of multi-threaded programming that make thread safety implemented with ease - you only need to use one for success:
Thread Safe Data types passed between threads.
Thread Safe Methods in the threaded object to modify data passed between.
PostMessage capabilities to communicate between threads.

Are the processes parallel? Is performance a real concern? Are there multiple 'threads' of execution like on a web server? I don't think there is a finite answer.

A common source of threading issues is the usual approaches employed to synchronize data. Having threads share state and then implement locking at all the appropriate places is a major source of complexity for both design and debugging. Getting the locking right to balance stability, performance, and scalability is always a hard problem to solve. Even the most experienced experts get it wrong frequently. Alternative techniques to deal with threading can alleviate much of this complexity. The Clojure programming language implements several interesting techniques for dealing with concurrency.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string