Threading model for data processing in a directed graph

Threading model for data processing in a directed graph - multithreading

I'm going to be designing a simple data analysis tool which processes different kinds of data through a directed graph. The directed graph is somewhat customizable by the user. Each node will consist of logging, analysis, and mathematical operations on the data passing through. The graph is similar in many ways to a neural network except with additional processing at each node. Some nodes do simple operations to the data elements passing through while other nodes have complex algorithms.
How do I multithread the processing in this directed graph such that I can get the result out of the graph in the fastest and most efficient way? Memory is not an issue here, and neither is the time it takes to initialize this task.
I've thought of a couple different methods to multithread the work:
Each thread instance 'follows' each data element entering the start node in this graph. The thread will stay with this data element as it passes through each node, calling the processing method on each node all the way down the tree. This will essentially require a single thread per data element entering the system. Of course, once the data element has been carried through the entire system, the thread will be recycled. The problem here is when two outgoing edges on a node exist--the thread would need to follow both (does this mean pull a new thread from a thread pool?).
Create a thread per node and create a data buffer on each graph edge. The worker thread on the node will continually check to hold data in the instance that one thread takes longer with the data. The problem with this approach is the inherent 'polling' of the buffer for having enough data to start processing it--perhaps a small price to pay for simplifying the data flow for any graph configuration.
Can anyone think of a better way, or which one do you recommend? I'm looking for the least latency through the system and the ability to constantly process a stream of incoming data.
Thanks!
Brett

First of all, it is not a good idea to spawn unlimited amount of threads (e.g. thread per node). Usually you want to have at most 1.5-3 times more threads than your CPU cores (e.g. 6-12 threads for quad-core).
I would recommend to use thread-pool and tasks. In such case your problem can be rephrased as what size your tasks should have.
Both of the methods you mentioned are valid and each has its own pros and cons.
One task per data input should be easy to implement, as the algorithm for graph processing will stay single-threaded. The overhead of context-switching, synchronization and data-passing between threads is almost none.
When there are two outgoing edges on a node, then this single task has to follow both of them. This is a standard part of all algorithms for graph traversal, e.g. depth-first search or breadth-first search.
One task per graph node can improve the latency in case that your graphs have many "branches" that can be processed in parallel. However this approach requires more complex design of graph processing and there will be higher overhead of thread synchronization. Actually the cost of multi-threading might be higher than the benefits gained by parallel processing of the graph.
When there are two outgoing edges on a node, you can create two new task and queue then on the thread-pool. (Or queue one task and continue with processing the other one.)
The more difficult problem is when there are two incoming edges on a node. The task processing the node will have to wait until data for both edges are available.
Conclusion: I would personally start with the first option (one task per data input) and see, how far you can get with it.

Related

Does it make sense to write concurrent program if you have 1 hardware thread? [duplicate]

What is the difference between concurrency and parallelism?

Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant. For example, multitasking on a single-core machine.
Parallelism is when tasks literally run at the same time, e.g., on a multicore processor.
Quoting Sun's Multithreaded Programming Guide:
Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.
Parallelism: A condition that arises when at least two threads are executing simultaneously.

Why the Confusion Exists
Confusion exists because dictionary meanings of both these words are almost the same:
Concurrent: existing, happening, or done at the same time(dictionary.com)
Parallel: very similar and often happening at the same time(merriam webster).
Yet the way they are used in computer science and programming are quite different. Here is my interpretation:
Concurrency: Interruptability
Parallelism: Independentability
So what do I mean by above definitions?
I will clarify with a real world analogy. Let’s say you have to get done 2 very important tasks in one day:
Get a passport
Get a presentation done
Now, the problem is that task-1 requires you to go to an extremely bureaucratic government office that makes you wait for 4 hours in a line to get your passport. Meanwhile, task-2 is required by your office, and it is a critical task. Both must be finished on a specific day.
Case 1: Sequential Execution
Ordinarily, you will drive to passport office for 2 hours, wait in the line for 4 hours, get the task done, drive back two hours, go home, stay awake 5 more hours and get presentation done.
Case 2: Concurrent Execution
But you’re smart. You plan ahead. You carry a laptop with you, and while waiting in the line, you start working on your presentation. This way, once you get back at home, you just need to work 1 extra hour instead of 5.
In this case, both tasks are done by you, just in pieces. You interrupted the passport task while waiting in the line and worked on presentation. When your number was called, you interrupted presentation task and switched to passport task. The saving in time was essentially possible due to interruptability of both the tasks.
Concurrency, IMO, can be understood as the "isolation" property in ACID. Two database transactions are considered isolated if sub-transactions can be performed in each and any interleaved way and the final result is same as if the two tasks were done sequentially. Remember, that for both the passport and presentation tasks, you are the sole executioner.
Case 3: Parallel Execution
Now, since you are such a smart fella, you’re obviously a higher-up, and you have got an assistant. So, before you leave to start the passport task, you call him and tell him to prepare first draft of the presentation. You spend your entire day and finish passport task, come back and see your mails, and you find the presentation draft. He has done a pretty solid job and with some edits in 2 more hours, you finalize it.
Now since, your assistant is just as smart as you, he was able to work on it independently, without needing to constantly ask you for clarifications. Thus, due to the independentability of the tasks, they were performed at the same time by two different executioners.
Still with me? Alright...
Case 4: Concurrent But Not Parallel
Remember your passport task, where you have to wait in the line?
Since it is your passport, your assistant cannot wait in line for you. Thus, the passport task has interruptability (you can stop it while waiting in the line, and resume it later when your number is called), but no independentability (your assistant cannot wait in your stead).
Case 5: Parallel But Not Concurrent
Suppose the government office has a security check to enter the premises. Here, you must remove all electronic devices and submit them to the officers, and they only return your devices after you complete your task.
In this, case, the passport task is neither independentable nor interruptible. Even if you are waiting in the line, you cannot work on something else because you do not have necessary equipment.
Similarly, say the presentation is so highly mathematical in nature that you require 100% concentration for at least 5 hours. You cannot do it while waiting in line for passport task, even if you have your laptop with you.
In this case, the presentation task is independentable (either you or your assistant can put in 5 hours of focused effort), but not interruptible.
Case 6: Concurrent and Parallel Execution
Now, say that in addition to assigning your assistant to the presentation, you also carry a laptop with you to passport task. While waiting in the line, you see that your assistant has created the first 10 slides in a shared deck. You send comments on his work with some corrections. Later, when you arrive back home, instead of 2 hours to finalize the draft, you just need 15 minutes.
This was possible because presentation task has independentability (either one of you can do it) and interruptability (you can stop it and resume it later). So you concurrently executed both tasks, and executed the presentation task in parallel.
Let’s say that, in addition to being overly bureaucratic, the government office is corrupt. Thus, you can show your identification, enter it, start waiting in line for your number to be called, bribe a guard and someone else to hold your position in the line, sneak out, come back before your number is called, and resume waiting yourself.
In this case, you can perform both the passport and presentation tasks concurrently and in parallel. You can sneak out, and your position is held by your assistant. Both of you can then work on the presentation, etc.
Back to Computer Science
In computing world, here are example scenarios typical of each of these cases:
Case 1: Interrupt processing.
Case 2: When there is only one processor, but all executing tasks have wait times due to I/O.
Case 3: Often seen when we are talking about map-reduce or hadoop clusters.
Case 4: I think Case 4 is rare. It’s uncommon for a task to be concurrent but not parallel. But it could happen. For example, suppose your task requires access to a special computational chip that can be accessed through only processor-1. Thus, even if processor-2 is free and processor-1 is performing some other task, the special computation task cannot proceed on processor-2.
Case 5: also rare, but not quite as rare as Case 4. A non-concurrent code can be a critical region protected by mutexes. Once it is started, it must execute to completion. However, two different critical regions can progress simultaneously on two different processors.
Case 6: IMO, most discussions about parallel or concurrent programming are basically talking about Case 6. This is a mix and match of both parallel and concurrent executions.
Concurrency and Go
If you see why Rob Pike is saying concurrency is better, you have to understand what the reason is. You have a really long task in which there are multiple waiting periods where you wait for some external operations like file read, network download. In his lecture, all he is saying is, “just break up this long sequential task so that you can do something useful while you wait.” That is why he talks about different organizations with various gophers.
Now the strength of Go comes from making this breaking really easy with go keyword and channels. Also, there is excellent underlying support in the runtime to schedule these goroutines.
But essentially, is concurrency better than parallelism?
Are apples better than oranges?

I like Rob Pike's talk: Concurrency is not Parallelism (it's better!)
(slides)
(talk)
Rob usually talks about Go and usually addresses the question of Concurrency vs Parallelism in a visual and intuitive explanation! Here is a short summary:
Task: Let's burn a pile of obsolete language manuals! One at a time!
Concurrency: There are many concurrently decompositions of the task! One example:
Parallelism: The previous configuration occurs in parallel if there are at least 2 gophers working at the same time or not.

To add onto what others have said:
Concurrency is like having a juggler juggle many balls. Regardless of how it seems, the juggler is only catching/throwing one ball per hand at a time. Parallelism is having multiple jugglers juggle balls simultaneously.

Say you have a program that has two threads. The program can run in two ways:
Concurrency Concurrency + parallelism
(Single-Core CPU) (Multi-Core CPU)
___ ___ ___
|th1| |th1|th2|
| | | |___|
|___|___ | |___
|th2| |___|th2|
___|___| ___|___|
|th1| |th1|
|___|___ | |___
|th2| | |th2|
In both cases we have concurrency from the mere fact that we have more than one thread running.
If we ran this program on a computer with a single CPU core, the OS would be switching between the two threads, allowing one thread to run at a time.
If we ran this program on a computer with a multi-core CPU then we would be able to run the two threads in parallel - side by side at the exact same time.

Concurrency: If two or more problems are solved by a single processor.
Parallelism: If one problem is solved by multiple processors.

Imagine learning a new programming language by watching a video tutorial. You need to pause the video, apply what been said in code then continue watching. That's concurrency.
Now you're a professional programmer. And you enjoy listening to calm music while coding. That's Parallelism.
As Andrew Gerrand said in GoLang Blog
Concurrency is about dealing with lots of things at once. Parallelism
is about doing lots of things at once.
Enjoy.

I will try to explain with an interesting and easy to understand example. :)
Assume that an organization organizes a chess tournament where 10 players (with equal chess playing skills) will challenge a professional champion chess player. And since chess is a 1:1 game thus organizers have to conduct 10 games in time efficient manner so that they can finish the whole event as quickly as possible.
Hopefully following scenarios will easily describe multiple ways of conducting these 10 games:
1) SERIAL - let's say that the professional plays with each person one by one i.e. starts and finishes the game with one person and then starts the next game with the next person and so on. In other words, they decided to conduct the games sequentially. So if one game takes 10 mins to complete then 10 games will take 100 mins, also assume that transition from one game to other takes 6 secs then for 10 games it will be 54 secs (approx. 1 min).
so the whole event will approximately complete in 101 mins (WORST APPROACH)
2) CONCURRENT - let's say that the professional plays his turn and moves on to the next player so all 10 players are playing simultaneously but the professional player is not with two person at a time, he plays his turn and moves on to the next person. Now assume a professional player takes 6 sec to play his turn and also transition time of a professional player b/w two players is 6 sec so the total transition time to get back to the first player will be 1min (10x6sec). Therefore, by the time he is back to the first person with whom the event was started, 2mins have passed (10xtime_per_turn_by_champion + 10xtransition_time=2mins)
Assuming that all player take 45sec to complete their turn so based on 10mins per game from SERIAL event the no. of rounds before a game finishes should 600/(45+6) = 11 rounds (approx)
So the whole event will approximately complete in 11xtime_per_turn_by_player_&_champion + 11xtransition_time_across_10_players = 11x51 + 11x60sec= 561 + 660 = 1221sec = 20.35mins (approximately)
SEE THE IMPROVEMENT from 101 mins to 20.35 mins (BETTER APPROACH)
3) PARALLEL - let's say organizers get some extra funds and thus decided to invite two professional champion players (both equally capable) and divided the set of same 10 players (challengers) into two groups of 5 each and assigned them to two champions i.e. one group each. Now the event is progressing in parallel in these two sets i.e. at least two players (one in each group) are playing against the two professional players in their respective group.
However within the group the professional player with take one player at a time (i.e. sequentially) so without any calculation you can easily deduce that whole event will approximately complete in 101/2=50.5mins to complete
SEE THE IMPROVEMENT from 101 mins to 50.5 mins (GOOD APPROACH)
4) CONCURRENT + PARALLEL - In the above scenario, let's say that the two champion players will play concurrently (read 2nd point) with the 5 players in their respective groups so now games across groups are running in parallel but within group, they are running concurrently.
So the games in one group will approximately complete in 11xtime_per_turn_by_player_&_champion + 11xtransition_time_across_5_players = 11x51 + 11x30 = 600 + 330 = 930sec = 15.5mins (approximately)
So the whole event (involving two such parallel running group) will approximately complete in 15.5mins
SEE THE IMPROVEMENT from 101 mins to 15.5 mins (BEST APPROACH)
NOTE: in the above scenario if you replace 10 players with 10 similar jobs and two professional players with two CPU cores then again the following ordering will remain true:
SERIAL > PARALLEL > CONCURRENT > CONCURRENT+PARALLEL
(NOTE: this order might change for other scenarios as this ordering highly depends on inter-dependency of jobs, communication needs between jobs and transition overhead between jobs)

Concurrent programming execution has 2 types : non-parallel concurrent programming and parallel concurrent programming (also known as parallelism).
The key difference is that to the human eye, threads in non-parallel concurrency appear to run at the same time but in reality they don't. In non - parallel concurrency threads rapidly switch and take turns to use the processor through time-slicing.
While in parallelism there are multiple processors available so, multiple threads can run on different processors at the same time.
Reference: Introduction to Concurrency in Programming Languages

Simple example:
Concurrent is: "Two queues accessing one ATM machine"
Parallel is: "Two queues and two ATM machines"

Parallelism is simultaneous execution of processes on a multiple cores per CPU or multiple CPUs (on a single motherboard).
Concurrency is when Parallelism is achieved on a single core/CPU by using scheduling algorithms that divides the CPU’s time (time-slice). Processes are interleaved.
Units:
1 or many cores in a single CPU (pretty much all modern day processors)
1 or many CPUs on a motherboard (think old school servers)
1 application is 1 program (think Chrome browser)
1 program can have 1 or many processes (think each Chrome browser tab is a process)
1 process can have 1 or many threads from 1 program (Chrome tab playing Youtube video in 1 thread, another thread spawned for comments
section, another for users login info)
Thus, 1 program can have 1 or many threads of execution
1 process is thread(s)+allocated memory resources by OS (heap, registers, stack, class memory)

They solve different problems. Concurrency solves the problem of having scarce CPU resources and many tasks. So, you create threads or independent paths of execution through code in order to share time on the scarce resource. Up until recently, concurrency has dominated the discussion because of CPU availability.
Parallelism solves the problem of finding enough tasks and appropriate tasks (ones that can be split apart correctly) and distributing them over plentiful CPU resources. Parallelism has always been around of course, but it's coming to the forefront because multi-core processors are so cheap.

concurency:
multiple execution flows with the potential to share resources
Ex:
two threads competing for a I/O port.
paralelism:
splitting a problem in multiple similar chunks.
Ex:
parsing a big file by running two processes on every half of the file.

Concurrency => When multiple tasks are performed in overlapping time periods with shared resources (potentially maximizing the resources utilization).
Parallel => when single task is divided into multiple simple independent sub-tasks which can be performed simultaneously.

Concurrency vs Parallelism
Rob Pike in 'Concurrency Is Not Parallelism'
Concurrency is about dealing with lots of things at once.
Parallelism is about doing lots of things at once.
[Concurrency theory]
Concurrency - handles several tasks at once
Parallelism - handles several thread at once
My vision of concurrency and parallelism
[Sync vs Async]
[Swift Concurrency]

If at all you want to explain this to a 9-year-old.

Think of it as servicing queues where server can only serve the 1st job in a queue.
1 server , 1 job queue (with 5 jobs) -> no concurrency, no parallelism (Only one job is being serviced to completion, the next job in the queue has to wait till the serviced job is done and there is no other server to service it)
1 server, 2 or more different queues (with 5 jobs per queue) -> concurrency (since server is sharing time with all the 1st jobs in queues, equally or weighted) , still no parallelism since at any instant, there is one and only job being serviced.
2 or more servers , one Queue -> parallelism ( 2 jobs done at the same instant) but no concurrency ( server is not sharing time, the 3rd job has to wait till one of the server completes.)
2 or more servers, 2 or more different queues -> concurrency and parallelism
In other words, concurrency is sharing time to complete a job, it MAY take up the same time to complete its job but at least it gets started early. Important thing is , jobs can be sliced into smaller jobs, which allows interleaving.
Parallelism is achieved with just more CPUs , servers, people etc that run in parallel.
Keep in mind, if the resources are shared, pure parallelism cannot be achieved, but this is where concurrency would have it's best practical use, taking up another job that doesn't need that resource.

I really like Paul Butcher's answer to this question (he's the writer of Seven Concurrency Models in Seven Weeks):
Although they’re often confused, parallelism and concurrency are
different things. Concurrency is an aspect of the problem domain—your
code needs to handle multiple simultaneous (or near simultaneous)
events. Parallelism, by contrast, is an aspect of the solution
domain—you want to make your program run faster by processing
different portions of the problem in parallel. Some approaches are
applicable to concurrency, some to parallelism, and some to both.
Understand which you’re faced with and choose the right tool for the
job.

In electronics serial and parallel represent a type of static topology, determining the actual behaviour of the circuit. When there is no concurrency, parallelism is deterministic.
In order to describe dynamic, time-related phenomena, we use the terms sequential and concurrent. For example, a certain outcome may be obtained via a certain sequence of tasks (eg. a recipe). When we are talking with someone, we are producing a sequence of words. However, in reality, many other processes occur in the same moment, and thus, concur to the actual result of a certain action. If a lot of people is talking at the same time, concurrent talks may interfere with our sequence, but the outcomes of this interference are not known in advance. Concurrency introduces indeterminacy.
The serial/parallel and sequential/concurrent characterization are orthogonal. An example of this is in digital communication. In a serial adapter, a digital message is temporally (i.e. sequentially) distributed along the same communication line (eg. one wire). In a parallel adapter, this is divided also on parallel communication lines (eg. many wires), and then reconstructed on the receiving end.
Let us image a game, with 9 children. If we dispose them as a chain, give a message at the first and receive it at the end, we would have a serial communication. More words compose the message, consisting in a sequence of communication unities.
I like ice-cream so much. > X > X > X > X > X > X > X > X > X > ....
This is a sequential process reproduced on a serial infrastructure.
Now, let us image to divide the children in groups of 3. We divide the phrase in three parts, give the first to the child of the line at our left, the second to the center line's child, etc.
I like ice-cream so much. > I like > X > X > X > .... > ....
> ice-cream > X > X > X > ....
> so much > X > X > X > ....
This is a sequential process reproduced on a parallel infrastructure (still partially serialized although).
In both cases, supposing there is a perfect communication between the children, the result is determined in advance.
If there are other persons that talk to the first child at the same time as you, then we will have concurrent processes. We do no know which process will be considered by the infrastructure, so the final outcome is non-determined in advance.

I'm going to offer an answer that conflicts a bit with some of the popular answers here. In my opinion, concurrency is a general term that includes parallelism. Concurrency applies to any situation where distinct tasks or units of work overlap in time. Parallelism applies more specifically to situations where distinct units of work are evaluated/executed at the same physical time. The raison d'etre of parallelism is speeding up software that can benefit from multiple physical compute resources. The other major concept that fits under concurrency is interactivity. Interactivity applies when the overlapping of tasks is observable from the outside world. The raison d'etre of interactivity is making software that is responsive to real-world entities like users, network peers, hardware peripherals, etc.
Parallelism and interactivity are almost entirely independent dimension of concurrency. For a particular project developers might care about either, both or neither. They tend to get conflated, not least because the abomination that is threads gives a reasonably convenient primitive to do both.
A little more detail about parallelism:
Parallelism exists at very small scales (e.g. instruction-level parallelism in processors), medium scales (e.g. multicore processors) and large scales (e.g. high-performance computing clusters). Pressure on software developers to expose more thread-level parallelism has increased in recent years, because of the growth of multicore processors. Parallelism is intimately connected to the notion of dependence. Dependences limit the extent to which parallelism can be achieved; two tasks cannot be executed in parallel if one depends on the other (Ignoring speculation).
There are lots of patterns and frameworks that programmers use to express parallelism: pipelines, task pools, aggregate operations on data structures ("parallel arrays").
A little more detail about interactivity:
The most basic and common way to do interactivity is with events (i.e. an event loop and handlers/callbacks). For simple tasks events are great. Trying to do more complex tasks with events gets into stack ripping (a.k.a. callback hell; a.k.a. control inversion). When you get fed up with events you can try more exotic things like generators, coroutines (a.k.a. Async/Await), or cooperative threads.
For the love of reliable software, please don't use threads if what you're going for is interactivity.
Curmudgeonliness
I dislike Rob Pike's "concurrency is not parallelism; it's better" slogan. Concurrency is neither better nor worse than parallelism. Concurrency includes interactivity which cannot be compared in a better/worse sort of way with parallelism. It's like saying "control flow is better than data".

From the book Linux System Programming by Robert Love:
Concurrency, Parallelism, and Races
Threads create two related but distinct phenomena: concurrency and
parallelism. Both are bittersweet, touching on the costs of threading
as well as its benefits. Concurrency is the ability of two or more
threads to execute in overlapping time periods. Parallelism is
the ability to execute two or more threads simultaneously.
Concurrency can occur without parallelism: for example, multitasking
on a single processor system. Parallelism (sometimes emphasized as
true parallelism) is a specific form of concurrency requiring multiple processors (or a single processor capable of multiple engines
of execution, such as a GPU). With concurrency, multiple threads make
forward progress, but not necessarily simultaneously. With
parallelism, threads literally execute in parallel, allowing
multithreaded programs to utilize multiple processors.
Concurrency is a programming pattern, a way of approaching problems.
Parallelism is a hardware feature, achievable through concurrency.
Both are useful.
This explanation is consistent with the accepted answer. Actually the concepts are far simpler than we think. Don't think them as magic. Concurrency is about a period of time, while Parallelism is about exactly at the same time, simultaneously.

Concurrency is the generalized form of parallelism. For example parallel program can also be called concurrent but reverse is not true.
Concurrent execution is possible on single processor (multiple threads, managed by scheduler or thread-pool)
Parallel execution is not possible on single processor but on multiple processors. (One process per processor)
Distributed computing is also a related topic and it can also be called concurrent computing but reverse is not true, like parallelism.
For details read this research paper
Concepts of Concurrent Programming

I really liked this graphical representation from another answer - I think it answers the question much better than a lot of the above answers
Parallelism vs Concurrency
When two threads are running in parallel, they are both running at the same time. For example, if we have two threads, A and B, then their parallel execution would look like this:
CPU 1: A ------------------------->
CPU 2: B ------------------------->
When two threads are running concurrently, their execution overlaps. Overlapping can happen in one of two ways: either the threads are executing at the same time (i.e. in parallel, as above), or their executions are being interleaved on the processor, like so:
CPU 1: A -----------> B ----------> A -----------> B ---------->
So, for our purposes, parallelism can be thought of as a special case of concurrency
Source: Another answer here
Hope that helps.

"Concurrency" is when there are multiple things in progress.
"Parallelism" is when concurrent things are progressing at the same time.
Examples of concurrency without parallelism:
Multiple threads on a single core.
Multiple messages in a Win32 message queue.
Multiple SqlDataReaders on a MARS connection.
Multiple JavaScript promises in a browser tab.
Note, however, that the difference between concurrency and parallelism is often a matter of perspective. The above examples are non-parallel from the perspective of (observable effects of) executing your code. But there is instruction-level parallelism even within a single core. There are pieces of hardware doing things in parallel with CPU and then interrupting the CPU when done. GPU could be drawing to screen while you window procedure or event handler is being executed. The DBMS could be traversing B-Trees for the next query while you are still fetching the results of the previous one. Browser could be doing layout or networking while your Promise.resolve() is being executed. Etc, etc...
So there you go. The world is as messy as always ;)

The simplest and most elegant way of understanding the two in my opinion is this. Concurrency allows interleaving of execution and so can give the illusion of parallelism. This means that a concurrent system can run your Youtube video alongside you writing up a document in Word, for example. The underlying OS, being a concurrent system, enables those tasks to interleave their execution. Because computers execute instructions so quickly, this gives the appearance of doing two things at once.
Parallelism is when such things really are in parallel. In the example above, you might find the video processing code is being executed on a single core, and the Word application is running on another. Note that this means that a concurrent program can also be in parallel! Structuring your application with threads and processes enables your program to exploit the underlying hardware and potentially be done in parallel.
Why not have everything be parallel then? One reason is because concurrency is a way of structuring programs and is a design decision to facilitate separation of concerns, whereas parallelism is often used in the name of performance. Another is that some things fundamentally cannot fully be done in parallel. An example of this would be adding two things to the back of a queue - you cannot insert both at the same time. Something must go first and the other behind it, or else you mess up the queue. Although we can interleave such execution (and so we get a concurrent queue), you cannot have it parallel.
Hope this helps!

"Concurrent" is doing things -- anything -- at the same time. They could be different things, or the same thing. Despite the accepted answer, which is lacking, it's not about "appearing to be at the same time." It's really at the same time. You need multiple CPU cores, either using shared memory within one host, or distributed memory on different hosts, to run concurrent code. Pipelines of 3 distinct tasks that are concurrently running at the same time are an example: Task-level-2 has to wait for units completed by task-level-1, and task-level-3 has to wait for units of work completed by task-level-2. Another example is concurrency of 1-producer with 1-consumer; or many-producers and 1-consumer; readers and writers; et al.
"Parallel" is doing the same things at the same time. It is concurrent, but furthermore it is the same behavior happening at the same time, and most typically on different data. Matrix algebra can often be parallelized, because you have the same operation running repeatedly: For example the column sums of a matrix can all be computed at the same time using the same behavior (sum) but on different columns. It is a common strategy to partition (split up) the columns among available processor cores, so that you have close to the same quantity of work (number of columns) being handled by each processor core. Another way to split up the work is bag-of-tasks where the workers who finish their work go back to a manager who hands out the work and get more work dynamically until everything is done. Ticketing algorithm is another.
Not just numerical code can be parallelized. Files too often can be processed in parallel. In a natural language processing application, for each of the millions of document files, you may need to count the number of tokens in the document. This is parallel, because you are counting tokens, which is the same behavior, for every file.
In other words, parallelism is when same behavior is being performed concurrently. Concurrently means at the same time, but not necessarily the same behavior. Parallel is a particular kind of concurrency where the same thing is happening at the same time.
Terms for example will include atomic instructions, critical sections, mutual exclusion, spin-waiting, semaphores, monitors, barriers, message-passing, map-reduce, heart-beat, ring, ticketing algorithms, threads, MPI, OpenMP.
Gregory Andrews' work is a top textbook on it: Multithreaded, Parallel, and Distributed Programming.

Concurrency can involve tasks run simultaneously or not (they can indeed be run in separate processors/cores but they can as well be run in "ticks"). What is important is that concurrency always refer to doing a piece of one greater task. So basically it's a part of some computations. You have to be smart about what you can do simultaneously and what not to and how to synchronize.
Parallelism means that you're just doing some things simultaneously. They don't need to be a part of solving one problem. Your threads can, for instance, solve a single problem each. Of course synchronization stuff also applies but from different perspective.

Parallelism:
Having multiple threads do similar task which are independent of each other in terms of data and resource that they require to do so. Eg: Google crawler can spawn thousands of threads and each thread can do it's task independently.
Concurrency:
Concurrency comes into picture when you have shared data, shared resource among the threads. In a transactional system this means you have to synchronize the critical section of the code using some techniques like Locks, semaphores, etc.

Explanation from this source was helpful for me:
Concurrency is related to how an application handles multiple tasks it
works on. An application may process one task at at time
(sequentially) or work on multiple tasks at the same time
(concurrently).
Parallelism on the other hand, is related to how an application
handles each individual task. An application may process the task
serially from start to end, or split the task up into subtasks which
can be completed in parallel.
As you can see, an application can be concurrent, but not parallel.
This means that it processes more than one task at the same time, but
the tasks are not broken down into subtasks.
An application can also be parallel but not concurrent. This means
that the application only works on one task at a time, and this task
is broken down into subtasks which can be processed in parallel.
Additionally, an application can be neither concurrent nor parallel.
This means that it works on only one task at a time, and the task is
never broken down into subtasks for parallel execution.
Finally, an application can also be both concurrent and parallel, in
that it both works on multiple tasks at the same time, and also breaks
each task down into subtasks for parallel execution. However, some of
the benefits of concurrency and parallelism may be lost in this
scenario, as the CPUs in the computer are already kept reasonably busy
with either concurrency or parallelism alone. Combining it may lead to
only a small performance gain or even performance loss.

Concurrent programming regards operations that appear to overlap and is primarily concerned with the complexity that arises due to non-deterministic control flow. The quantitative costs associated with concurrent programs are typically both throughput and latency. Concurrent programs are often IO bound but not always, e.g. concurrent garbage collectors are entirely on-CPU. The pedagogical example of a concurrent program is a web crawler. This program initiates requests for web pages and accepts the responses concurrently as the results of the downloads become available, accumulating a set of pages that have already been visited. Control flow is non-deterministic because the responses are not necessarily received in the same order each time the program is run. This characteristic can make it very hard to debug concurrent programs. Some applications are fundamentally concurrent, e.g. web servers must handle client connections concurrently. Erlang is perhaps the most promising upcoming language for highly concurrent programming.
Parallel programming concerns operations that are overlapped for the specific goal of improving throughput. The difficulties of concurrent programming are evaded by making control flow deterministic. Typically, programs spawn sets of child tasks that run in parallel and the parent task only continues once every subtask has finished. This makes parallel programs much easier to debug. The hard part of parallel programming is performance optimization with respect to issues such as granularity and communication. The latter is still an issue in the context of multicores because there is a considerable cost associated with transferring data from one cache to another. Dense matrix-matrix multiply is a pedagogical example of parallel programming and it can be solved efficiently by using Straasen's divide-and-conquer algorithm and attacking the sub-problems in parallel. Cilk is perhaps the most promising language for high-performance parallel programming on shared-memory computers (including multicores).
Copied from my answer: https://stackoverflow.com/a/3982782

Managing the TPL Queue

I've got a service that runs scans of various servers. The networks in question can be huge (hundreds of thousands of network nodes).
The current version of the software is using a queueing/threading architecture designed by us which works but isn't as efficient as it could be (not least of which because jobs can spawn children which isn't handled well)
V2 is coming up and I'm considering using the TPL. It seems like it should be ideally suited.
I've seen this question, the answer to which implies there's no limit to the tasks TPL can handle. In my simple tests (Spin up 100,000 tasks and give them to TPL), TPL barfed fairly early on with an Out-Of-Memory exception (fair enough - especially on my dev box).
The Scans take a variable length of time but 5 mins/task is a good average.
As you can imagine, scans for huge networks can take a considerable length of time, even on beefy servers.
I've already got a framework in place which allows the scan jobs (stored in a Db) to be split between multiple scan servers, but the question is how exactly I should pass work to the TPL on a specific server.
Can I monitor the size of TPL's queue and (say) top it up if it falls below a couple of hundred entries? Is there a downside to doing this?
I also need to handle the situation where a scan needs to be paused. This is seems easier to do by not giving the work to TPL than by cancelling/resetting tasks which may already be partially processed.
All of the initial tasks can be run in any order. Children must be run after the parent has started executing but since the parent spawns them, this shouldn't ever be a problem. Children can be run in any order. Because of this, I'm currently envisioning that child tasks be written back to the Db not spawned directly into TPL. This would allow other servers to "work steal" if required.
Has anyone had any experience with using the TPL in this way? Are there any considerations I need to be aware of?

TPL is about starting small units of work and running them in parallel. It is not about monitoring, pausing, or throttling this work.
You should see TPL as a low-level tool to start "work" and to synchronize threads.
Key point: TPL tasks != logical tasks. Logical tasks are in your case scan-tasks ("scan an ip-range from x to y"). Such a task should not correspond to a physical task "System.Threading.Task" because the two are different concepts.
You need to schedule, orchestrate, monitor and pause the logical tasks yourself because TPL does not understand them and cannot be made to.
Now the more practical concerns:
TPL can certainly start 100k tasks without OOM. The OOM happened because your tasks' code exhausted memory.
Scanning networks sounds like a great case for asynchronous code because while you are scanning you are likely to wait on results while having a great degree of parallelism. You probably don't want to have 500 threads in your process all waiting for a network packet to arrive. Asynchronous tasks fit well with the TPL because every task you run becomes purely CPU-bound and small. That is the sweet spot for TPL.

How to articulate the difference between asynchronous and parallel programming?

Many platforms promote asynchrony and parallelism as means for improving responsiveness. I understand the difference generally, but often find it difficult to articulate in my own mind, as well as for others.
I am a workaday programmer and use async & callbacks fairly often. Parallelism feels exotic.
But I feel like they are easily conflated, especially at the language design level. Would love a clear description of how they relate (or don't), and the classes of programs where each is best applied.

When you run something asynchronously it means it is non-blocking, you execute it without waiting for it to complete and carry on with other things. Parallelism means to run multiple things at the same time, in parallel. Parallelism works well when you can separate tasks into independent pieces of work.
Take for example rendering frames of a 3D animation. To render the animation takes a long time so if you were to launch that render from within your animation editing software you would make sure it was running asynchronously so it didn't lock up your UI and you could continue doing other things. Now, each frame of that animation can also be considered as an individual task. If we have multiple CPUs/Cores or multiple machines available, we can render multiple frames in parallel to speed up the overall workload.

I believe the main distinction is between concurrency and parallelism.
Async and Callbacks are generally a way (tool or mechanism) to express concurrency i.e. a set of entities possibly talking to each other and sharing resources.
In the case of async or callback communication is implicit while sharing of resources is optional (consider RMI where results are computed in a remote machine).
As correctly noted this is usually done with responsiveness in mind; to not wait for long latency events.
Parallel programming has usually throughput as the main objective while latency, i.e. the completion time for a single element, might be worse than a equivalent sequential program.
To better understand the distinction between concurrency and parallelism I am going to quote from Probabilistic models for concurrency of Daniele Varacca which is a good set of notes for theory of concurrency:
A model of computation is a model for concurrency when it is able to represent systems as composed of independent autonomous components, possibly communicating with each other. The notion of concurrency should not be confused with the notion of parallelism. Parallel computations usually involve a central control which distributes the work among several processors. In concurrency we stress the independence of the components, and the fact that they communicate with each other. Parallelism is like ancient Egypt, where the Pharaoh decides and the slaves work. Concurrency is like modern Italy, where everybody does what they want, and all use mobile phones.
In conclusion, parallel programming is somewhat a special case of concurrency where separate entities collaborate to obtain high performance and throughput (generally).
Async and Callbacks are just a mechanism that allows the programmer to express concurrency.
Consider that well-known parallel programming design patterns such as master/worker or map/reduce are implemented by frameworks that use such lower level mechanisms (async) to implement more complex centralized interactions.

This article explains it very well: http://urda.cc/blog/2010/10/04/asynchronous-versus-parallel-programming
It has this about asynchronous programming:
Asynchronous calls are used to prevent “blocking” within an application. [Such a] call will spin-off in an already existing thread (such as an I/O thread) and do its task when it can.
this about parallel programming:
In parallel programming you still break up work or tasks, but the key differences is that you spin up new threads for each chunk of work
and this in summary:
asynchronous calls will use threads already in use by the system and parallel programming requires the developer to break the work up, spinup, and teardown threads needed.

async: Do this by yourself somewhere else and notify me when you complete(callback). By the time i can continue to do my thing.
parallel: Hire as many guys(threads) as you wish and split the job to them to complete quicker and let me know(callback) when you complete. By the time i might continue to do my other stuff.
the main difference is parallelism mostly depends on hardware.

My basic understanding is:
Asynchonous programming solves the problem of waiting around for an expensive operation to complete before you can do anything else. If you can get other stuff done while you're waiting for the operation to complete then that's a good thing. Example: keeping a UI running while you go and retrieve more data from a web service.
Parallel programming is related but is more concerned with breaking a large task into smaller chunks that can be computed at the same time. The results of the smaller chunks can then be combined to produce the overall result. Example: ray-tracing where the colour of individual pixels is essentially independent.
It's probably more complicated than that, but I think that's the basic distinction.

I tend to think of the difference in these terms:
Asynchronous: Go away and do this task, when you're finished come back and tell me and bring the results. I'll be getting on with other things in the mean time.
Parallel: I want you to do this task. If it makes it easier, get some folks in to help. This is urgent though, so I'll wait here until you come back with the results. I can do nothing else until you come back.
Of course an asynchronous task might make use of parallelism, but the differentiation - to my mind at least - is whether you get on with other things while the operation is being carried out or if you stop everything completely until the results are in.

It is a question of order of execution.
If A is asynchronous with B, then I cannot predict beforehand when subparts of A will happen with respect to subparts of B.
If A is parallel with B, then things in A are happening at the same time as things in B. However, an order of execution may still be defined.
Perhaps the difficulty is that the word asynchronous is equivocal.
I execute an asynchronous task when I tell my butler to run to the store for more wine and cheese, and then forget about him and work on my novel until he knocks on the study door again. Parallelism is happening here, but the butler and I are engaged in fundamentally different tasks and of different social classes, so we don't apply that label here.
My team of maids is working in parallel when each of them is washing a different window.
My race car support team is asynchronously parallel in that each team works on a different tire and they don't need to communicate with each other or manage shared resources while they do their job.
My football (aka soccer) team does parallel work as each player independently processes information about the field and moves about on it, but they are not fully asynchronous because they must communicate and respond to the communication of others.
My marching band is also parallel as each player reads music and controls their instrument, but they are highly synchronous: they play and march in time to each other.
A cammed gatling gun could be considered parallel, but everything is 100% synchronous, so it is as though one process is moving forward.

Why Asynchronous ?
With today's application's growing more and more connected and also potentially
long running tasks or blocking operations such as Network I/O or Database Operations.So it's very important to hide the latency of these operations by starting them in background and returning back to the user interface quickly as possible. Here Asynchronous come in to the picture, Responsiveness.
Why parallel programming?
With today's data sets growing larger and computations growing more complex. So it's very important to reduce the execution time of these CPU-bound operations, in this case, by dividing the workload into chunks and then executing those chunks simultaneously. We can call this as "Parallel" .
Obviously it will give high Performance to our application.

Asynchronous
Let's say you are the point of contact for your client and you need to be responsive i.e. you need to share status, complexity of operation, resources required etc whenever asked. Now you have a time-consuming operation to be done and hence cannot take this up as you need to be responsive to the client 24/7. Hence, you delegate the time-consuming operation to someone else so that you can be responsive. This is asynchronous.
Parallel programming
Let's say you have a task to read, say, 100 lines from a text file, and reading one line takes 1 second. Hence, you'll require 100 seconds to read the text file. Now you're worried that the client must wait for 100 seconds for the operation to finish. Hence you create 9 more clones and make each of them read 10 lines from the text file. Now the time taken is only 10 seconds to read 100 lines. Hence you have better performance.
To sum up, asynchronous coding is done to achieve responsiveness and parallel programming is done for performance.

Asynchronous: Running a method or task in background, without blocking. May not necessorily run on a separate thread. Uses Context Switching / time scheduling.
Parallel Tasks: Each task runs parallally. Does not use context switching / time scheduling.

I came here fairly comfortable with the two concepts, but with something not clear to me about them.
After reading through some of the answers, I think I have a correct and helpful metaphor to describe the difference.
If you think of your individual lines of code as separate but ordered playing cards (stop me if I am explaining how old-school punch cards work), then for each separate procedure written, you will have a unique stack of cards (don't copy & paste!) and the difference between what normally goes on when run code normally and asynchronously depends on whether you care or not.
When you run the code, you hand the OS a set of single operations (that your compiler or interpreter broke your "higher" level code into) to be passed to the processor. With one processor, only one line of code can be executed at any one time. So, in order to accomplish the illusion of running multiple processes at the same time, the OS uses a technique in which it sends the processor only a few lines from a given process at a time, switching between all the processes according to how it sees fit. The result is multiple processes showing progress to the end user at what seems to be the same time.
For our metaphor, the relationship is that the OS always shuffles the cards before sending them to the processor. If your stack of cards doesn't depend on another stack, you don't notice that your stack stopped getting selected from while another stack became active. So if you don't care, it doesn't matter.
However, if you do care (e.g., there are multiple processes - or stacks of cards - that do depend on each other), then the OS's shuffling will screw up your results.
Writing asynchronous code requires handling the dependencies between the order of execution regardless of what that ordering ends up being. This is why constructs like "call-backs" are used. They say to the processor, "the next thing to do is tell the other stack what we did". By using such tools, you can be assured that the other stack gets notified before it allows the OS to run any more of its instructions. ("If called_back == false: send(no_operation)" - not sure if this is actually how it is implemented, but logically, I think it is consistent.)
For parallel processes, the difference is that you have two stacks that don't care about each other and two workers to process them. At the end of the day, you may need to combine the results from the two stacks, which would then be a matter of synchronicity but, for execution, you don't care again.
Not sure if this helps but, I always find multiple explanations helpful. Also, note that asynchronous execution is not constrained to an individual computer and its processors. Generally speaking, it deals with time, or (even more generally speaking) an order of events. So if you send dependent stack A to network node X and its coupled stack B to Y, the correct asynchronous code should be able to account for the situation as if it was running locally on your laptop.

Generally, there are only two ways you can do more than one thing each time. One is asynchronous, the other is parallel.
From the high level, like the popular server NGINX and famous Python library Tornado, they both fully utilize asynchronous paradigm which is Single thread server could simultaneously serve thousands of clients (some IOloop and callback). Using ECF(exception control follow) which could implement the asynchronous programming paradigm. so asynchronous sometimes doesn't really do thing simultaneous, but some io bound work, asynchronous could really promotes the performance.
The parallel paradigm always refers multi-threading, and multiprocessing. This can fully utilize multi-core processors, do things really simultaneously.

Summary of all above answers
parallel computing:
▪ solves throughput issue.
Concerned with breaking a large task into smaller chunks
▪ is machine related (multi machine/core/cpu/processor needed), eg: master slave, map reduce.
Parallel computations usually involve a central control which distributes the work among several processors
asynchronous:
▪ solves latency issue
ie, the problem of 'waiting around' for an expensive operation to complete before you can do anything else
▪ is thread related (multi thread needed)
Threading (using Thread, Runnable, Executor) is one fundamental way to perform asynchronous operations in Java

Multithreading vs virtual process

There's three types of control flow model,
single threaded, virtual process and multithreaded process.
here's what has written in the power point which I study form
Virtual processes. This is based on a
single threaded model but gives the
appearance of concurrent execution. a
controller component schedules the
execution of the other components and
gives them control. The scheduling can
be performed periodically or based on
events. This model is based on a
logical decomposition of activities in
simple steps whose execution requires
only short intervals of time.
I couldn't understand it and couldn't understand the difference between multithreading process and vp.
can some one help?
EDIT here the chapter of the book which I mention the section above form
http://www.mediafire.com/?ru82i0nvp12qw6t

This term "virtual process" is unusual but based on your description I can give 2 real-world examples of using each. For multithreading, imagine you have a lot of data in memory and want to perform some calculations on it... you can split that data up and have seperate threads (1 per CPU core, ideally) simultaneously working on different chunks of the data. This way, the calculations will be done faster based on how many threads you create. For 'virtual process', imagine you need to retrieve 20 files from remote servers... most of the CPU 'work' involved in this is just sitting around waiting for bytes to arrive from the remote network. Creating separate threads to download each of these files would not make the files arrive any faster. If anything, having extra threads that the OS needs to constantly switch between (and it will switch a LOT because most of the time each thread will just say 'im still waiting' and then cede control). So, in this case it's better to have a single thread doing all of the downloading, cycling internally between each of the download tasks to read incomming data off of their buffers.

Your virtual process looks to me like event driven programming. Google for eg. 'threads vs events', the first link you get is quite fine comparison.
EDIT: Here's another comparison I've found in bookmarks.

What would be a realworld example of use of a barrier in a multithreaded application?

JDK's concurrency package, Boost's thread library, Perl's Thread library (not in Python though) all implement barrier, I haven't come across a need for using a barrier, so wondering what would a typical use case be in multi-threaded applications.

Barriers can be used all over the place through contrived examples but you'll often seem them in a scatter/reduce method where the results of the different threads are all required before proceeding.
If you wanted to parallelize a sort for instance you could split the list n times and start n threads to sort their section and pause, when they're all finished they would die letting the parent know that it's finally okay to combine the sorted chunks. (I know there are better ways but it's one implementation).
The other place I've seen it is in parallelized networking where you have to send a certain amount of data per payload. So the interface will start up n buckets and wait for them all to fill before sending out the transmission. When you think about a partitioned T1 line it sorta makes sense, sending one burst of data over the 64 multiplexed partitions would be better than sending 1 partition's data (which essentially costs the same since the packet has to be padded with 0's.)
Hope those are some things to get you thinking about it the problem!

Example: a set of threads work concurrently to compute a result-set and the said result-set (in part/total) is required as input for the next stage of processing to some/all threads at the "barrier".
A barrier makes it easier to synchronize multiple threads without having to craft a solution around multiple conditions & mutexes.
I can't say I have seen barriers often though. At some point, as the number of threads grows, it might be worthwhile considering a more "decoupled" system as to manage possible dead-locks.

MSDN: A Barrieris an object that prevents individual tasks in a parallel operation from continuing until all tasks reach the barrier. It is useful when a parallel operation occurs in
phases, and each phase requires synchronization between tasks.
Found here

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string