How to make an event that takes time nonblocking?

How to make an event that takes time nonblocking? - node.js

I was thinking - since node.js runs in single thread, what if I want to do some algorithmically difficult computation (hard_and_complex_function()), that has nothing to do with I/O but takes a LOT of time? Can I make it non-blocking? Isn't it a disadvantage compared to multi-threading technologies - where I can simply run it in a separate thread?

While you are correct with regard to threads, you have at least two options that might address your problem at hand:
Use process.nextTick() to yield the CPU at appropriate points in the long computation.
Use a separate process (using child_process or Cluster) to carry out your long computation.
You may also want, for future use, to take a look at Generators and yield coming in ES6.

There is solution available for tacking long running application in Node.js. Have a look at below library :
https://github.com/xk/node-threads-a-gogo

Related

Why do we need both threading and asynchronous calls

My understanding is that threads exist as a way of doing several things in parallel that share the same address space but each has its individual stack. Asynchronous programming is basically a way of using fewer threads. I don't understand why it's undesirable to just have blocking calls and a separate thread for each blocked command?
For example, suppose I want to scrape a large part of the web. A presumably uncontroversial implementation might be to have a large number of asynchronous loops. Each loop would ask for a webpage, wait to avoid overloading the remote server, then ask for another webpage from the same website until done. The loops would then be executed on a much smaller number of threads (which is fine because they mostly wait). So to restate the question, I don't see why it's any cheaper to e.g. maintain a threadpool within the language runtime than it would be to just have one (mostly blocked) OS thread per loop and let the operating system deal with the complexity of scheduling the work? After all, if piling two different schedulers on top of each other is so good, it can still be implemented that way in the OS :-)
It seems obvious the answer is something like "threads are expensive". However, a thread just needs to keep track of where it has got to before it was interrupted the last time. This is pretty much exactly what an asynchronous command needs to know before it blocks (perhaps representing what happens next by storing a callback). I suppose one difference is that a blocking asynchronous command does so at a well defined point whereas a thread can be interrupted anywhere. If there really is a substantial difference in terms of the cost of keeping the state, where does it come from? I doubt it's the cost of the stack since that wastes at most a 4KB page, so it's a non-issue even for 1000s of blocked threads.
Many thanks, and sorry if the question is very basic. It might be I just cannot figure out the right incantation to type into Google.

Threads consume memory, as they need to have their state preserved even if they aren't doing anything. If you make an asynchronous call on a single thread, it's literally (aside from registering that the call was made somewhere) not consuming any resources until it needs to be resolved, because if it isn't actively being processed you don't care about it.
If the architecture of your application is written in a way that the resources it needs scale linearly (or worse) with the number of users / traffic you receive, then that is going to be a problem. You can watch this talk about node.js if you want to watch someone talk at length about this.
https://www.youtube.com/watch?v=ztspvPYybIY

NodeJS: Parallelism of async module

I am following the async module's each method (https://github.com/caolan/async#each). It says the method iterates over the array parallely. "Parallely" is the word that confuses me. AFAIK, in now way JavaScript can execute code parallely because it has a single-threaded model.
The examples shown in the each method focuses on the IO scenarios. I am using the "each" method just to add numbers of the array. If parallelism exists, can I prove this using my example?
Thanks for reading.

The 'parallel' in the async documentation doesn't refer to 'parallel' in terms of concurrency (like multiple processes or threads being run at the same time), but 'parallel' in terms of each step being independent of the other steps (the opposite operation would be eachSeries, where each step is run only after the previous has finished).
The parallel version would only make sense if the steps perform some kind of I/O, which (because of Node's asynchronous nature) could run parallel to each other: if one step has to wait for I/O, the other steps can happily continue to send/receive data.
If the steps are mainly cpu-bound (that is, performing lots of calculations), it's not going to provide you any better performance because, like you say, Node runs the interpreter in a single thread, and that's not something that async changes.

Like robertklep said, it is more of concurrent instead of parallel. You are not going to achieve much performance gain by doing compute heavy code in parallel. It is useful when you have to do parallel I/O (communicating with an external web service for all the items of an array, for example).

Which would be better for concurrent tasks on node.js? Fibers? Web-workers? or Threads?

I stumbled over node.js sometime ago and like it a lot. But soon I found out that it lacked badly the ability to perform CPU-intensive tasks. So, I started googling and got these answers to solve the problem: Fibers, Webworkers and Threads (thread-a-gogo). Now which one to use is a confusion and one of them definitely needs to be used - afterall what's the purpose of having a server which is just good at IO and nothing else? Suggestions needed!
UPDATE:
I was thinking of a way off-late; just needing suggestions over it. Now, what I thought of was this: Let's have some threads (using thread_a_gogo or maybe webworkers). Now, when we need more of them, we can create more. But there will be some limit over the creation process. (not implied by the system but probably because of overhead). Now, when we exceed the limit, we can fork a new node, and start creating threads over it. This way, it can go on till we reach some limit (after all, processes too have a big overhead). When this limit is reached, we start queuing tasks. Whenever a thread becomes free, it will be assigned a new task. This way, it can go on smoothly.
So, that was what I thought of. Is this idea good? I am a bit new to all this process and threads stuff, so don't have any expertise in it. Please share your opinions.
Thanks. :)

Node has a completely different paradigm and once it is correctly captured, it is easier to see this different way of solving problems. You never need multiple threads in a Node application(1) because you have a different way of doing the same thing. You create multiple processes; but it is very very different than, for example how Apache Web Server's Prefork mpm does.
For now, let's think that we have just one CPU core and we will develop an application (in Node's way) to do some work. Our job is to process a big file running over its contents byte-by-byte. The best way for our software is to start the work from the beginning of the file, follow it byte-by-byte to the end.
-- Hey, Hasan, I suppose you are either a newbie or very old school from my Grandfather's time!!! Why don't you create some threads and make it much faster?
-- Oh, we have only one CPU core.
-- So what? Create some threads man, make it faster!
-- It does not work like that. If I create threads I will be making it slower. Because I will be adding a lot of overhead to the system for switching between threads, trying to give them a just amount of time, and inside my process, trying to communicate between these threads. In addition to all these facts, I will also have to think about how I will divide a single job into multiple pieces that can be done in parallel.
-- Okay okay, I see you are poor. Let's use my computer, it has 32 cores!
-- Wow, you are awesome my dear friend, thank you very much. I appreciate it!
Then we turn back to work. Now we have 32 cpu cores thanks to our rich friend. Rules we have to abide have just changed. Now we want to utilize all this wealth we are given.
To use multiple cores, we need to find a way to divide our work into pieces that we can handle in parallel. If it was not Node, we would use threads for this; 32 threads, one for each cpu core. However, since we have Node, we will create 32 Node processes.
Threads can be a good alternative to Node processes, maybe even a better way; but only in a specific kind of job where the work is already defined and we have complete control over how to handle it. Other than this, for every other kind of problem where the job comes from outside in a way we do not have control over and we want to answer as quickly as possible, Node's way is unarguably superior.
-- Hey, Hasan, are you still working single-threaded? What is wrong with you, man? I have just provided you what you wanted. You have no excuses anymore. Create threads, make it run faster.
-- I have divided the work into pieces and every process will work on one of these pieces in parallel.
-- Why don't you create threads?
-- Sorry, I don't think it is usable. You can take your computer if you want?
-- No okay, I am cool, I just don't understand why you don't use threads?
-- Thank you for the computer. :) I already divided the work into pieces and I create processes to work on these pieces in parallel. All the CPU cores will be fully utilized. I could do this with threads instead of processes; but Node has this way and my boss Parth Thakkar wants me to use Node.
-- Okay, let me know if you need another computer. :p
If I create 33 processes, instead of 32, the operating system's scheduler will be pausing a thread, start the other one, pause it after some cycles, start the other one again... This is unnecessary overhead. I do not want it. In fact, on a system with 32 cores, I wouldn't even want to create exactly 32 processes, 31 can be nicer. Because it is not just my application that will work on this system. Leaving a little room for other things can be good, especially if we have 32 rooms.
I believe we are on the same page now about fully utilizing processors for CPU-intensive tasks.
-- Hmm, Hasan, I am sorry for mocking you a little. I believe I understand you better now. But there is still something I need an explanation for: What is all the buzz about running hundreds of threads? I read everywhere that threads are much faster to create and dumb than forking processes? You fork processes instead of threads and you think it is the highest you would get with Node. Then is Node not appropriate for this kind of work?
-- No worries, I am cool, too. Everybody says these things so I think I am used to hearing them.
-- So? Node is not good for this?
-- Node is perfectly good for this even though threads can be good too. As for thread/process creation overhead; on things that you repeat a lot, every millisecond counts. However, I create only 32 processes and it will take a tiny amount of time. It will happen only once. It will not make any difference.
-- When do I want to create thousands of threads, then?
-- You never want to create thousands of threads. However, on a system that is doing work that comes from outside, like a web server processing HTTP requests; if you are using a thread for each request, you will be creating a lot of threads, many of them.
-- Node is different, though? Right?
-- Yes, exactly. This is where Node really shines. Like a thread is much lighter than a process, a function call is much lighter than a thread. Node calls functions, instead of creating threads. In the example of a web server, every incoming request causes a function call.
-- Hmm, interesting; but you can only run one function at the same time if you are not using multiple threads. How can this work when a lot of requests arrive at the web server at the same time?
-- You are perfectly right about how functions run, one at a time, never two in parallel. I mean in a single process, only one scope of code is running at a time. The OS Scheduler does not come and pause this function and switch to another one, unless it pauses the process to give time to another process, not another thread in our process. (2)
-- Then how can a process handle 2 requests at a time?
-- A process can handle tens of thousands of requests at a time as long as our system has enough resources (RAM, Network, etc.). How those functions run is THE KEY DIFFERENCE.
-- Hmm, should I be excited now?
-- Maybe :) Node runs a loop over a queue. In this queue are our jobs, i.e, the calls we started to process incoming requests. The most important point here is the way we design our functions to run. Instead of starting to process a request and making the caller wait until we finish the job, we quickly end our function after doing an acceptable amount of work. When we come to a point where we need to wait for another component to do some work and return us a value, instead of waiting for that, we simply finish our function adding the rest of work to the queue.
-- It sounds too complex?
-- No no, I might sound complex; but the system itself is very simple and it makes perfect sense.
Now I want to stop citing the dialogue between these two developers and finish my answer after a last quick example of how these functions work.
In this way, we are doing what OS Scheduler would normally do. We pause our work at some point and let other function calls (like other threads in a multi-threaded environment) run until we get our turn again. This is much better than leaving the work to OS Scheduler which tries to give just time to every thread on system. We know what we are doing much better than OS Scheduler does and we are expected to stop when we should stop.
Below is a simple example where we open a file and read it to do some work on the data.
Synchronous Way:
Open File
Repeat This:
Read Some
Do the work
Asynchronous Way:
Open File and Do this when it is ready: // Our function returns
Repeat this:
Read Some and when it is ready: // Returns again
Do some work
As you see, our function asks the system to open a file and does not wait for it to be opened. It finishes itself by providing next steps after file is ready. When we return, Node runs other function calls on the queue. After running over all the functions, the event loop moves to next turn...
In summary, Node has a completely different paradigm than multi-threaded development; but this does not mean that it lacks things. For a synchronous job (where we can decide the order and way of processing), it works as well as multi-threaded parallelism. For a job that comes from outside like requests to a server, it simply is superior.
(1) Unless you are building libraries in other languages like C/C++ in which case you still do not create threads for dividing jobs. For this kind of work you have two threads one of which will continue communication with Node while the other does the real work.
(2) In fact, every Node process has multiple threads for the same reasons I mentioned in the first footnote. However this is no way like 1000 threads doing similar works. Those extra threads are for things like to accept IO events and to handle inter-process messaging.
UPDATE (As reply to a good question in comments)
#Mark, thank you for the constructive criticism. In Node's paradigm, you should never have functions that takes too long to process unless all other calls in the queue are designed to be run one after another. In case of computationally expensive tasks, if we look at the picture in complete, we see that this is not a question of "Should we use threads or processes?" but a question of "How can we divide these tasks in a well balanced manner into sub-tasks that we can run them in parallel employing multiple CPU cores on the system?" Let's say we will process 400 video files on a system with 8 cores. If we want to process one file at a time, then we need a system that will process different parts of the same file in which case, maybe, a multi-threaded single-process system will be easier to build and even more efficient. We can still use Node for this by running multiple processes and passing messages between them when state-sharing/communication is necessary. As I said before, a multi-process approach with Node is as well as a multi-threaded approach in this kind of tasks; but not more than that. Again, as I told before, the situation that Node shines is when we have these tasks coming as input to system from multiple sources since keeping many connections concurrently is much lighter in Node compared to a thread-per-connection or process-per-connection system.
As for setTimeout(...,0) calls; sometimes giving a break during a time consuming task to allow calls in the queue have their share of processing can be required. Dividing tasks in different ways can save you from these; but still, this is not really a hack, it is just the way event queues work. Also, using process.nextTick for this aim is much better since when you use setTimeout, calculation and checks of the time passed will be necessary while process.nextTick is simply what we really want: "Hey task, go back to end of the queue, you have used your share!"

(Update 2016: Web workers are going into io.js - a Node.js fork Node.js v7 - see below.)
(Update 2017: Web workers are not going into Node.js v7 or v8 - see below.)
(Update 2018: Web workers are going into Node.js Node v10.5.0 - see below.)
Some clarification
Having read the answers above I would like to point out that there is nothing in web workers that is against the philosophy of JavaScript in general and Node in particular regarding concurrency. (If there was, it wouldn't be even discussed by the WHATWG, much less implemented in the browsers).
You can think of a web worker as a lightweight microservice that is accessed asynchronously. No state is shared. No locking problems exist. There is no blocking. There is no synchronization needed. Just like when you use a RESTful service from your Node program you don't worry that it is now "multithreaded" because the RESTful service is not in the same thread as your own event loop. It's just a separate service that you access asynchronously and that is what matters.
The same is with web workers. It's just an API to communicate with code that runs in a completely separate context and whether it is in different thread, different process, different cgroup, zone, container or different machine is completely irrelevant, because of a strictly asynchronous, non-blocking API, with all data passed by value.
As a matter of fact web workers are conceptually a perfect fit for Node which - as many people are not aware of - incidentally uses threads quite heavily, and in fact "everything runs in parallel except your code" - see:
Understanding the node.js event loop by Mikito Takada
Understanding node.js by Felix Geisendörfer
Understanding the Node.js Event Loop by Trevor Norris
Node.js itself is blocking, only its I/O is non-blocking by Jeremy Epstein
But the web workers don't even need to be implemented using threads. You could use processes, green threads, or even RESTful services in the cloud - as long as the web worker API is used. The whole beauty of the message passing API with call by value semantics is that the underlying implementation is pretty much irrelevant, as the details of the concurrency model will not get exposed.
A single-threaded event loop is perfect for I/O-bound operations. It doesn't work that well for CPU-bound operations, especially long running ones. For that we need to spawn more processes or use threads. Managing child processes and the inter-process communication in a portable way can be quite difficult and it is often seen as an overkill for simple tasks, while using threads means dealing with locks and synchronization issues that are very difficult to do right.
What is often recommended is to divide long-running CPU-bound operations into smaller tasks (something like the example in the "Original answer" section of my answer to Speed up setInterval) but it is not always practical and it doesn't use more than one CPU core.
I'm writing it to clarify the comments that were basically saying that web workers were created for browsers, not servers (forgetting that it can be said about pretty much everything in JavaScript).
Node modules
There are few modules that are supposed to add Web Workers to Node:
https://github.com/pgriess/node-webworker
https://github.com/audreyt/node-webworker-threads
I haven't used any of them but I have two quick observations that may be relevant: as of March 2015, node-webworker was last updated 4 years ago and node-webworker-threads was last updated a month ago. Also I see in the example of node-webworker-threads usage that you can use a function instead of a file name as an argument to the Worker constructor which seems that may cause subtle problems if it is implemented using threads that share memory (unless the functions is used only for its .toString() method and is otherwise compiled in a different environment, in which case it may be fine - I have to look more deeply into it, just sharing my observations here).
If there is any other relevant project that implements web workers API in Node, please leave a comment.
Update 1
I didn't know it yet at the time of writing but incidentally one day before I wrote this answer Web Workers were added to io.js.
(io.js is a fork of Node.js - see: Why io.js decided to fork Node.js, an InfoWorld interview with Mikeal Rogers, for more info.)
Not only does it prove the point that there is nothing in web workers that is against the philosophy of JavaScript in general and Node in particular regarding concurrency, but it may result in web workers being a first class citizen in server-side JavaScript like io.js (and possibly Node.js in the future) just as it already is in client-side JavaScript in all modern browsers.
Update 2
In Update 1 and my tweet I was referring to io.js pull request #1159
which now redirects to
Node PR #1159
that was closed on Jul 8 and replaced with Node PR #2133 - which is still open.
There is some discussion taking place under those pull requests that may provide some more up to date info on the status of Web workers in io.js/Node.js.
Update 3
Latest info - thanks to NiCk Newman for posting it in
the comments: There is the workers: initial implementation commit by Petka Antonov from Sep 6, 2015
that can be downloaded and tried out in
this tree. See comments by NiCk Newman for details.
Update 4
As of May 2016 the last comments on the still open PR #2133 - workers: initial implementation were 3 months old. On May 30 Matheus Moreira asked me to post an update to this answer in the comments below and he asked for the current status of this feature in the PR comments.
The first answers in the PR discussion were skeptical but later
Ben Noordhuis wrote that "Getting this merged in one shape or another is on my todo list for v7".
All other comments seemed to second that and as of July 2016 it seems that Web Workers should be available in the next version of Node, version 7.0 that is planned to be released on October 2016 (not necessarily in the form of this exact PR).
Thanks to Matheus Moreira for pointing it out in the comments and reviving the discussion on GitHub.
Update 5
As of July 2016 there are few modules on npm that were not available before - for a complete list of relevant modules, search npm for workers, web workers, etc. If anything in particular does or doesn't work for you, please post a comment.
Update 6
As of January 2017 it is unlikely that web workers will get merged into Node.js.
The pull request #2133 workers: initial implementation by Petka Antonov from July 8, 2015 was finally closed by Ben Noordhuis on December 11, 2016 who commented that "multi-threading support adds too many new failure modes for not enough benefit" and "we can also accomplish that using more traditional means like shared memory and more efficient serialization."
For more information see the comments to the PR 2133 on GitHub.
Thanks again to Matheus Moreira for pointing it out in the comments.
Update 6
I'm happy to announce that few days ago, in June 2018 web workers appeared in Node v10.5.0 as an experimental feature activated with the --experimental-worker flag.
For more info, see:
Node v10.5.0 release blog post
Pull Request #20876 - worker: initial implementation by Anna Henningsen
My original tweet of happiness when I learned that this got into v10.5.0:
🎉🎉🎉 Finally! I can make the 7th update to my 3 year old Stack Overflow answer where I argue that threading a la web workers is not against Node philosophy, only this time saying that we finally got it! 😜👍

I come from the old school of thought where we used multi-threading to make software fast. For past 3 years i have been using Node.js and a big supporter of it. As hasanyasin explained in detail how node works and the concept of asyncrous functionality. But let me add few things here.
Back in the old days with single cores and lower clock speeds we tried various ways to make software work fast and parallel. in DOS days we use to run one program at a time. Than in windows we started running multiple applications (processes) together. Concepts like preemptive and non-preemptive (or cooperative) where tested. we know now that preemptive was the answer for better multi-processing task on single core computers. Along came the concepts of processes/tasks and context switching. Than the concept of thread to further reduce the burden of process context switching. Thread where coined as light weight alternative to spawning new processes.
So like it or not signal thread or not multi-core or single core your processes will be preempted and time sliced by the OS.
Nodejs is a single process and provides async mechanism. Here jobs are dispatched to under lying OS to perform tasks while we waiting in an event loop for the task to finish. Once we get a green signal from OS we perform what ever we need to do. Now in a way this is cooperative/non-preemptive multi-tasking, so we should never block the event loop for a very long period of time other wise we will degrade our application very fast.
So if there is ever a task that is blocking in nature or is very time consuming we will have to branch it out to the preemptive world of OS and threads.
there are good examples of this is in the libuv documentation. Also if you read the documentation further you find that FileI/O is handled in threads in node.js.
So Firstly its all in the design of our software. Secondly Context switching is always happening no matter what they tell you. Thread are there and still there for a reason, the reason is they are faster to switch in between then processes.
Under hood in node.js its all c++ and threads. And node provides c++ way to extend its functionality and to further speed out by using threads where they are a must i.e., blocking tasks such as reading from a source writing to a source, large data analysis so on so forth.
I know hasanyasin answer is the accepted one but for me threads will exist no matter what you say or how you hide them behind scripts, secondly no one just breaks things in to threads just for speed it is mostly done for blocking tasks. And threads are in the back bone of Node.js so before completely bashing multi-threading is in correct. Also threads are different from processes and the limitation of having node processes per core don't exactly apply to number of threads, threads are like sub tasks to a process. in fact threads won;t show up in your windows task manager or linux top command. once again they are more little weight then processes

I'm not sure if webworkers are relevant in this case, they are client-side tech (run in the browser), while node.js runs on the server. Fibers, as far as I understand, are also blocking, i.e. they are voluntary multitasking, so you could use them, but should manage context switches yourself via yield. Threads might be actually what you need, but I don't know how mature they are in node.js.

worker_threads has been implemented and shipped behind a flag in node#10.5.0. It's still an initial implementation and more efforts are needed to make it more efficient in future releases. Worth giving it a try in latest node.

In many Node developers' opinions one of the best parts of Node is actually its single-threaded nature. Threads introduce a whole slew of difficulties with shared resources that Node completely avoids by doing nothing but non-blocking IO.
That's not to say that Node is limited to a single thread. It's just that the method for getting threaded concurrency is different from what you're looking for. The standard way to deal with threads is with the cluster module that comes standard with Node itself. It's a simpler approach to threads than manually dealing with them in your code.
For dealing with asynchronous programming in your code (as in, avoiding nested callback pyramids), the [Future] component in the Fibers library is a decent choice. I would also suggest you check out Asyncblock which is based on Fibers. Fibers are nice because they allow you to hide callback by duplicating the stack and then jumping between stacks on a single-thread as they're needed. Saves you the hassle of real threads while giving you the benefits. The downside is that stack traces can get a bit weird when using Fibers, but they aren't too bad.
If you don't need to worry about async stuff and are more just interested in doing a lot of processing without blocking, a simple call to process.nextTick(callback) every once in a while is all you need.

Maybe some more information on what tasks you are performing would help. Why would you need to (as you mentioned in your comment to genericdave's answer) need to create many thousands of them? The usual way of doing this sort of thing in Node is to start up a worker process (using fork or some other method) which always runs and can be communicated to using messages. In other words, don't start up a new worker each time you need to perform whatever task it is you're doing, but simply send a message to the already running worker and get a response when it's done. Honestly, I can't see that starting up many thousands of actual threads would be very efficient either, you are still limited by you CPUs.
Now, after saying all of that, I have been doing a lot of work with Hook.io lately which seems to work very well for this sort of off-loading tasks into other processes, maybe it can accomplish what you need.

How to articulate the difference between asynchronous and parallel programming?

Many platforms promote asynchrony and parallelism as means for improving responsiveness. I understand the difference generally, but often find it difficult to articulate in my own mind, as well as for others.
I am a workaday programmer and use async & callbacks fairly often. Parallelism feels exotic.
But I feel like they are easily conflated, especially at the language design level. Would love a clear description of how they relate (or don't), and the classes of programs where each is best applied.

When you run something asynchronously it means it is non-blocking, you execute it without waiting for it to complete and carry on with other things. Parallelism means to run multiple things at the same time, in parallel. Parallelism works well when you can separate tasks into independent pieces of work.
Take for example rendering frames of a 3D animation. To render the animation takes a long time so if you were to launch that render from within your animation editing software you would make sure it was running asynchronously so it didn't lock up your UI and you could continue doing other things. Now, each frame of that animation can also be considered as an individual task. If we have multiple CPUs/Cores or multiple machines available, we can render multiple frames in parallel to speed up the overall workload.

I believe the main distinction is between concurrency and parallelism.
Async and Callbacks are generally a way (tool or mechanism) to express concurrency i.e. a set of entities possibly talking to each other and sharing resources.
In the case of async or callback communication is implicit while sharing of resources is optional (consider RMI where results are computed in a remote machine).
As correctly noted this is usually done with responsiveness in mind; to not wait for long latency events.
Parallel programming has usually throughput as the main objective while latency, i.e. the completion time for a single element, might be worse than a equivalent sequential program.
To better understand the distinction between concurrency and parallelism I am going to quote from Probabilistic models for concurrency of Daniele Varacca which is a good set of notes for theory of concurrency:
A model of computation is a model for concurrency when it is able to represent systems as composed of independent autonomous components, possibly communicating with each other. The notion of concurrency should not be confused with the notion of parallelism. Parallel computations usually involve a central control which distributes the work among several processors. In concurrency we stress the independence of the components, and the fact that they communicate with each other. Parallelism is like ancient Egypt, where the Pharaoh decides and the slaves work. Concurrency is like modern Italy, where everybody does what they want, and all use mobile phones.
In conclusion, parallel programming is somewhat a special case of concurrency where separate entities collaborate to obtain high performance and throughput (generally).
Async and Callbacks are just a mechanism that allows the programmer to express concurrency.
Consider that well-known parallel programming design patterns such as master/worker or map/reduce are implemented by frameworks that use such lower level mechanisms (async) to implement more complex centralized interactions.

This article explains it very well: http://urda.cc/blog/2010/10/04/asynchronous-versus-parallel-programming
It has this about asynchronous programming:
Asynchronous calls are used to prevent “blocking” within an application. [Such a] call will spin-off in an already existing thread (such as an I/O thread) and do its task when it can.
this about parallel programming:
In parallel programming you still break up work or tasks, but the key differences is that you spin up new threads for each chunk of work
and this in summary:
asynchronous calls will use threads already in use by the system and parallel programming requires the developer to break the work up, spinup, and teardown threads needed.

async: Do this by yourself somewhere else and notify me when you complete(callback). By the time i can continue to do my thing.
parallel: Hire as many guys(threads) as you wish and split the job to them to complete quicker and let me know(callback) when you complete. By the time i might continue to do my other stuff.
the main difference is parallelism mostly depends on hardware.

My basic understanding is:
Asynchonous programming solves the problem of waiting around for an expensive operation to complete before you can do anything else. If you can get other stuff done while you're waiting for the operation to complete then that's a good thing. Example: keeping a UI running while you go and retrieve more data from a web service.
Parallel programming is related but is more concerned with breaking a large task into smaller chunks that can be computed at the same time. The results of the smaller chunks can then be combined to produce the overall result. Example: ray-tracing where the colour of individual pixels is essentially independent.
It's probably more complicated than that, but I think that's the basic distinction.

I tend to think of the difference in these terms:
Asynchronous: Go away and do this task, when you're finished come back and tell me and bring the results. I'll be getting on with other things in the mean time.
Parallel: I want you to do this task. If it makes it easier, get some folks in to help. This is urgent though, so I'll wait here until you come back with the results. I can do nothing else until you come back.
Of course an asynchronous task might make use of parallelism, but the differentiation - to my mind at least - is whether you get on with other things while the operation is being carried out or if you stop everything completely until the results are in.

It is a question of order of execution.
If A is asynchronous with B, then I cannot predict beforehand when subparts of A will happen with respect to subparts of B.
If A is parallel with B, then things in A are happening at the same time as things in B. However, an order of execution may still be defined.
Perhaps the difficulty is that the word asynchronous is equivocal.
I execute an asynchronous task when I tell my butler to run to the store for more wine and cheese, and then forget about him and work on my novel until he knocks on the study door again. Parallelism is happening here, but the butler and I are engaged in fundamentally different tasks and of different social classes, so we don't apply that label here.
My team of maids is working in parallel when each of them is washing a different window.
My race car support team is asynchronously parallel in that each team works on a different tire and they don't need to communicate with each other or manage shared resources while they do their job.
My football (aka soccer) team does parallel work as each player independently processes information about the field and moves about on it, but they are not fully asynchronous because they must communicate and respond to the communication of others.
My marching band is also parallel as each player reads music and controls their instrument, but they are highly synchronous: they play and march in time to each other.
A cammed gatling gun could be considered parallel, but everything is 100% synchronous, so it is as though one process is moving forward.

Why Asynchronous ?
With today's application's growing more and more connected and also potentially
long running tasks or blocking operations such as Network I/O or Database Operations.So it's very important to hide the latency of these operations by starting them in background and returning back to the user interface quickly as possible. Here Asynchronous come in to the picture, Responsiveness.
Why parallel programming?
With today's data sets growing larger and computations growing more complex. So it's very important to reduce the execution time of these CPU-bound operations, in this case, by dividing the workload into chunks and then executing those chunks simultaneously. We can call this as "Parallel" .
Obviously it will give high Performance to our application.

Asynchronous
Let's say you are the point of contact for your client and you need to be responsive i.e. you need to share status, complexity of operation, resources required etc whenever asked. Now you have a time-consuming operation to be done and hence cannot take this up as you need to be responsive to the client 24/7. Hence, you delegate the time-consuming operation to someone else so that you can be responsive. This is asynchronous.
Parallel programming
Let's say you have a task to read, say, 100 lines from a text file, and reading one line takes 1 second. Hence, you'll require 100 seconds to read the text file. Now you're worried that the client must wait for 100 seconds for the operation to finish. Hence you create 9 more clones and make each of them read 10 lines from the text file. Now the time taken is only 10 seconds to read 100 lines. Hence you have better performance.
To sum up, asynchronous coding is done to achieve responsiveness and parallel programming is done for performance.

Asynchronous: Running a method or task in background, without blocking. May not necessorily run on a separate thread. Uses Context Switching / time scheduling.
Parallel Tasks: Each task runs parallally. Does not use context switching / time scheduling.

I came here fairly comfortable with the two concepts, but with something not clear to me about them.
After reading through some of the answers, I think I have a correct and helpful metaphor to describe the difference.
If you think of your individual lines of code as separate but ordered playing cards (stop me if I am explaining how old-school punch cards work), then for each separate procedure written, you will have a unique stack of cards (don't copy & paste!) and the difference between what normally goes on when run code normally and asynchronously depends on whether you care or not.
When you run the code, you hand the OS a set of single operations (that your compiler or interpreter broke your "higher" level code into) to be passed to the processor. With one processor, only one line of code can be executed at any one time. So, in order to accomplish the illusion of running multiple processes at the same time, the OS uses a technique in which it sends the processor only a few lines from a given process at a time, switching between all the processes according to how it sees fit. The result is multiple processes showing progress to the end user at what seems to be the same time.
For our metaphor, the relationship is that the OS always shuffles the cards before sending them to the processor. If your stack of cards doesn't depend on another stack, you don't notice that your stack stopped getting selected from while another stack became active. So if you don't care, it doesn't matter.
However, if you do care (e.g., there are multiple processes - or stacks of cards - that do depend on each other), then the OS's shuffling will screw up your results.
Writing asynchronous code requires handling the dependencies between the order of execution regardless of what that ordering ends up being. This is why constructs like "call-backs" are used. They say to the processor, "the next thing to do is tell the other stack what we did". By using such tools, you can be assured that the other stack gets notified before it allows the OS to run any more of its instructions. ("If called_back == false: send(no_operation)" - not sure if this is actually how it is implemented, but logically, I think it is consistent.)
For parallel processes, the difference is that you have two stacks that don't care about each other and two workers to process them. At the end of the day, you may need to combine the results from the two stacks, which would then be a matter of synchronicity but, for execution, you don't care again.
Not sure if this helps but, I always find multiple explanations helpful. Also, note that asynchronous execution is not constrained to an individual computer and its processors. Generally speaking, it deals with time, or (even more generally speaking) an order of events. So if you send dependent stack A to network node X and its coupled stack B to Y, the correct asynchronous code should be able to account for the situation as if it was running locally on your laptop.

Generally, there are only two ways you can do more than one thing each time. One is asynchronous, the other is parallel.
From the high level, like the popular server NGINX and famous Python library Tornado, they both fully utilize asynchronous paradigm which is Single thread server could simultaneously serve thousands of clients (some IOloop and callback). Using ECF(exception control follow) which could implement the asynchronous programming paradigm. so asynchronous sometimes doesn't really do thing simultaneous, but some io bound work, asynchronous could really promotes the performance.
The parallel paradigm always refers multi-threading, and multiprocessing. This can fully utilize multi-core processors, do things really simultaneously.

Summary of all above answers
parallel computing:
▪ solves throughput issue.
Concerned with breaking a large task into smaller chunks
▪ is machine related (multi machine/core/cpu/processor needed), eg: master slave, map reduce.
Parallel computations usually involve a central control which distributes the work among several processors
asynchronous:
▪ solves latency issue
ie, the problem of 'waiting around' for an expensive operation to complete before you can do anything else
▪ is thread related (multi thread needed)
Threading (using Thread, Runnable, Executor) is one fundamental way to perform asynchronous operations in Java

Threads or asynch?

How do you make your application multithreaded ?
Do you use asynch functions ?
or do you spawn a new thread ?
I think that asynch functions are already spawning a thread so if your job is doing just some file reading, being lazy and just spawning your job on a thread would just "waste" ressources...
So is there some kind of design when using thread or asynch functions ?

If you are talking about .Net, then don't forget the ThreadPool. The thread pool is also what asynch functions often use. Spawning to much threads can actually hurt your performance. A thread pool is designed to spawn just enough threads to do the work the fastest. So do use a thread pool instead of spwaning your own threads, unless the thread pool doesn't meet your needs.
PS: And keep an eye out on the Parallel Extensions from Microsoft

Spawning threads is only going to waste resources if you start spawning tons of them, one or two extra threads isn't going to effect the platforms proformance, infact System currently has over 70 threads for me, and msn is using 32 (I really have no idea how a messenger can use that many threads, exspecialy when its minimised and not really doing anything...)
Useualy a good time to spawn a thread is when something will take a long time, but you need to keep doing something else.
eg say a calculation will take 30 seconds. The best thing to do is spawn a new thread for the calculation, so that you can continue to update the screen, and handle any user input because users will hate it if your app freezes untill its finished doing the calculation.
On the other hand, creating threads to do something that can be done almost instantly is nearly pointless, since the overhead of creating (or even just passing work to an existing thread using a thread pool) will be higher than just doing the job in the first place.
Sometimes you can break your app into a couple of seprate parts which run in their own threads. For example in games the updates/physics etc may be one thread, while grahpics are another, sound/music is a third, and networking is another. The problem here is you really have to think about how these parts will interact or else you may have worse proformance, bugs that happen seemingly "randomly", or it may even deadlock.

I'll second Fire Lancer's answer - creating your own threads is an excellent way to process big tasks or to handle a task that would otherwise be "blocking" to the rest of synchronous app, but you have to have a clear understanding of the problem that you must solve and develope in a way that clearly defines the task of a thread, and limits the scope of what it does.
For an example I recently worked on - a Java console app runs periodically to capture data by essentially screen-scraping urls, parsing the document with DOM, extracting data and storing it in a database.
As a single threaded application, it, as you would expect, took an age, averaging around 1 url a second for a 50kb page. Not too bad, but when you scale out to needing to processes thousands of urls in a batch, it's no good.
Profiling the app showed that most of the time the active thread was idle - it was waiting for I/O operations - opening of a socket to the remote URL, opening a connection to the database etc. It's this sort of situation that can easily be improved with multithreading. Rewriting to be multi-threaded and with just 5 threads instead of one, even on a single core cpu, gave an increase in throughput of over 20 times.
In this example, each "worker" thread was explicitly limited to what it did - open the remote a remote url, parse the data, store it in the db. All the "high level" processing - generating the list of urls to parse, working out which next, handling errors, all remained with the control of the main thread.

The use of threads makes you think more about the way your application needs threading and can in the long run make it easier to improve / control your performance.
Async methods are faster to use but they are a bit magic - a lot of things happen to make them possible - so it's probable that at some point you will need something that they can't give you. Then you can try and roll some custom threading code.
It all depends on your needs.

The answer is "it depends".
It depends on what you're trying to achieve. I'm going to assume that you're aiming for more performance.
The simplest solution is to find another way to improve your performance. Run a profiler. Look for hot spots. Reduce unnecessary IO.
The next solution is to break your program into multiple processes, each of which can run in their own address space. This is easiest because there is no chance of the individual processes messing each other up.
The next solution is to use threads. At this point you're opening a major can of worms, so start small, and only multi-thread the critical path of the code.
The next solution is to use asynch IO. Generally only recommended for people writing some of very heavily loaded server, and even then I would rather re-use one of the existing frameworks that abstract away the details e.g. the C++ framework ICE, or an EJB server under java.
Note that each of these solutions has multiple sub-solutions - there are different breeds of threads and different kinds of asynch IO, each with slightly different performance characteristics, but again, it's generally best to let the framework handle it for you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string