NodeJS: Parallelism of async module - node.js

I am following the async module's each method (https://github.com/caolan/async#each). It says the method iterates over the array parallely. "Parallely" is the word that confuses me. AFAIK, in now way JavaScript can execute code parallely because it has a single-threaded model.
The examples shown in the each method focuses on the IO scenarios. I am using the "each" method just to add numbers of the array. If parallelism exists, can I prove this using my example?
Thanks for reading.

The 'parallel' in the async documentation doesn't refer to 'parallel' in terms of concurrency (like multiple processes or threads being run at the same time), but 'parallel' in terms of each step being independent of the other steps (the opposite operation would be eachSeries, where each step is run only after the previous has finished).
The parallel version would only make sense if the steps perform some kind of I/O, which (because of Node's asynchronous nature) could run parallel to each other: if one step has to wait for I/O, the other steps can happily continue to send/receive data.
If the steps are mainly cpu-bound (that is, performing lots of calculations), it's not going to provide you any better performance because, like you say, Node runs the interpreter in a single thread, and that's not something that async changes.

Like robertklep said, it is more of concurrent instead of parallel. You are not going to achieve much performance gain by doing compute heavy code in parallel. It is useful when you have to do parallel I/O (communicating with an external web service for all the items of an array, for example).

Related

Dart is Single Threaded but why it uses Future Objects and perform asynchronous operations

In Documentation, Dart is Single Threaded but to perform two operations at a time we use future objects which work same as thread.
Use Future objects (futures) to perform asynchronous operations.
If Dart is single threaded then why it allows to perform asynchronous operations.
Note: Asynchronous operations are parallel operations which are called threads
You mentioned that :
Asynchronous operations are parallel operations which are called threads
First of all, Asynchronous operations are not exactly parallel or even concurrent. Its just simply means that we do not want to block our flow of execution(Thread) or wait for the response until certain work is done. But the way we implement Asynchronous operations could decide either it is parallel or concurrent.
Parallellism vs Concurrency ?
Parallelism is actually doing lots of things simultaneously at the
same time. ex - You are walking and at the same time you're digesting
you food. Both tasks are completely running parallel and exactly at the
same time.
While
Concurrency is the illusion of Parallelism.Tasks seems to be Executed
parallel but they aren't. It like handing lots of things at a time but
only doing one task at a specific time. ex - You are walking and suddenly stop to tie your show lace. After tying your shoe lace you again start walking.
Now coming to Dart, Future Objects along with async and await keywords are used to perform asynchronous task. Here asynchronous doesn't means that tasks will be executed parallel or concurrent to each other. Instead in Dart even the asynchronous task is executed on the same thread which means that while we wait for another task to be completed, we will continue executing our synchronous code . Future Objects are used to represent the result of task which will be done at some time in future.
If you want to really execute your task concurrently then consider using Isolates(Which runs in separate thread and doesn't shares it memory with the main thread(or spawning thread).
Why? Because it is a necessity. Some operations, like http requests or timers, are asynchronous in nature.
There are isolates which allow you to execute code in a different process. The difference to threads in other programming languages is that isolates do not share memory with each other (which would lead to concurrency issues), they only communicate through messages.
To receive these messages (or wrapped in a Future, the result of it), Dart uses an event loop.
The Event Loop and Dart
Are Futures in Dart threads?
Dart is single threaded, but it can call native code(like c/c++) to perform asynchronous operations, which can introduce new thread.
In Flutter, Flutter engine is implement in c++, which provide the low-level implementation of Flutter’s core API, including asynchronous tasks like file and network I/O through new thread underneath.
Like Dart, JavaScript is also single threaded, I find this video very helpful to understand "Single Threaded" thing. what the heck is event loop
Here are a few notes:
Asynchronous doesn't mean multi-threaded. It means the code is not run at the same time. Usually asyncronous just means that it is scheduled to be run on the same thread (Isolate) after other tasks have finished.
Dart isn't actually single threaded. You can create another thread by creating another Isolate. However, within an Isolate the Dart code runs on a single thread and separate Isolates don't share memory. They can only communicate by messages.
A Future says that a value (or an error) will be returned at some point in the future. It doesn't say which thread the work is done on. Most futures are done on the current Isolate, but some futures (IO, for example) can be done on separate threads.
See this answer for links to more resources.
I have an article explaining this https://medium.com/#truongsinh/flutter-dart-async-concurrency-demystify-1cc739aaae57
In short, Flutter/Dart is not technically single-threaded, even though Dart code is executed in a single thread. Dart is a concurrent language with message passing pattern, that can take full advantage of modern multi-core architecture, without worrying about lock or mutex. Blocking in Dart can be either I/O-bound or CPU-bound, which should be solved, respectively, by Future and Dart’s Isolate/Flutter’s compute.

Node.js and Underscore.js

I'm just starting out with Node.js and I understand that most operations must work with callbacks to be non-blocking. My question pertains to the methods that Underscore.js exposes. For example
_.shuffle([1, 2, 3, 4, 5, 6]);
Wouldn't this be considered synchronous code given that no callback is provided? Consider a large list to shuffle.
Trying to come to grips in terms of what libraries I can use with node without impacting the fundamentals of using node.
Thanks!
Node is single threaded so any work that needs to get done will eventually be done by that thread. The async nature of Node means that it always tries to keep itself busy with work instead of waiting around for data to be returned (things like database calls, networks calls, disk access, etc). When you read things about making sure code is asynchronous, these are the types of operations that people are talking about.
Shuffling a bunch of numbers is a bunch of work that has to be done by the single Node thread, making this type of call async wouldn't do anything. So yes, that call is synchronous and will block the thread but there really isn't an alternative (without spawning worker threads or additional node processes). This is one of the reasons that Node really isn't the best option if you have a lot of heavy computations to do since it will block the single thread. Node is best at doing lots and lots of short duration tasks quickly.
Note that shuffling a million numbers will probably still be faster than a single database call, so this particular operation wouldn't impact overall performance that much. If you need to shuffle 100 million numbers, Node probably isn't the right platform.
Yes, it's synchronous, but that's not a problem in this example (or really any of Underscore's methods).
The reason many node APIs are asynchronous is because they perform potentially long operations. In order to do so, the work is offloaded to native OS asynchronous facilities (sockets) or done on a separate thread. Only when the work is complete is the data marshaled back into JS land and a callback invoked.
In this case, you're dealing strictly with JavaScript-managed memory. Only one thread has access to JS memory; you can't share memory between threads. This means you must do your work (shuffling the array) synchronously.
Unless you're dealing with a truly large array, this won't be a problem.
The "never make synchronous calls in node" rule really applies to I/O and computationally expensive operations. That's why all the network, filesystem, crypto, and zlib APIs are asynchronous. You'll notice that other APIs like the URL/path parsing modules are synchronous calls.

How to make an event that takes time nonblocking?

I was thinking - since node.js runs in single thread, what if I want to do some algorithmically difficult computation (hard_and_complex_function()), that has nothing to do with I/O but takes a LOT of time? Can I make it non-blocking? Isn't it a disadvantage compared to multi-threading technologies - where I can simply run it in a separate thread?
While you are correct with regard to threads, you have at least two options that might address your problem at hand:
Use process.nextTick() to yield the CPU at appropriate points in the long computation.
Use a separate process (using child_process or Cluster) to carry out your long computation.
You may also want, for future use, to take a look at Generators and yield coming in ES6.
There is solution available for tacking long running application in Node.js. Have a look at below library :
https://github.com/xk/node-threads-a-gogo

How to avoid asynchronous waiting

I have an application where I need to query a database to get/put information. I can't do it synchronously as it would block my entire process until the function returns.
Basically I have a few functions that run one or more queries at certain points.
fun
stuff1
stuff2
stuff3
query1
stuff4
query2
stuff5
I could start the functions in separate threads, but then I would have to lock everything to prevent races (I think locking could be slow ?)
I could start the queries asynchronously and monitor them but then I would have to split my functions and use callbacks that would run when the qouery is over
I am interested in a general solution, but my platform is POSIX and the database is (unfortunately) mysql.
What would you do ? How would you handle this ?
Thank you for your time.
There are patterns that have been known and used for quite sometime and I believe they have not changed.
Running independent functions: Use different threads
Running independent functions and a then function dependent on all of them: Use different threads and joining them at the end - synchronise them. I do not know about POSIX but in .NET we have EventWaitHandle that can wait for multiple threads and notify when all finished.
Running functions that each depends on another: run on a single background thread and chain the callbacks. Again .NET offers Task<T> chaining which makes reading and writing the code much simpler. jquery now offers promise which is the same thing.
Depends on how complicated the situation is. In a simple scenario breaking the work across multiple functions which are given as callbacks to the query will work - and is a valid solution. In a more complicated scenario, you need some dependency injection framework like spring.
You can create a new thread that would handle a database queries queue. This thread would hold a list containing the next actions to perform on the database and would be accessed by a function like : MyDatabaseQueue.PerformActionWhenFree( Action a, Callback callmebackwhendone ). This thread would be responsible for creating one query thread at a time. That way you can always receive more queries in the queue and have only one database query thread at a time.
Well if the queries are tightly coupled, you can simply start a parallel thread with pthread_create and run them sequentially on that thread. Thus, your main thread won't be blocked and you still won't need to employ any locks.

How to articulate the difference between asynchronous and parallel programming?

Many platforms promote asynchrony and parallelism as means for improving responsiveness. I understand the difference generally, but often find it difficult to articulate in my own mind, as well as for others.
I am a workaday programmer and use async & callbacks fairly often. Parallelism feels exotic.
But I feel like they are easily conflated, especially at the language design level. Would love a clear description of how they relate (or don't), and the classes of programs where each is best applied.
When you run something asynchronously it means it is non-blocking, you execute it without waiting for it to complete and carry on with other things. Parallelism means to run multiple things at the same time, in parallel. Parallelism works well when you can separate tasks into independent pieces of work.
Take for example rendering frames of a 3D animation. To render the animation takes a long time so if you were to launch that render from within your animation editing software you would make sure it was running asynchronously so it didn't lock up your UI and you could continue doing other things. Now, each frame of that animation can also be considered as an individual task. If we have multiple CPUs/Cores or multiple machines available, we can render multiple frames in parallel to speed up the overall workload.
I believe the main distinction is between concurrency and parallelism.
Async and Callbacks are generally a way (tool or mechanism) to express concurrency i.e. a set of entities possibly talking to each other and sharing resources.
In the case of async or callback communication is implicit while sharing of resources is optional (consider RMI where results are computed in a remote machine).
As correctly noted this is usually done with responsiveness in mind; to not wait for long latency events.
Parallel programming has usually throughput as the main objective while latency, i.e. the completion time for a single element, might be worse than a equivalent sequential program.
To better understand the distinction between concurrency and parallelism I am going to quote from Probabilistic models for concurrency of Daniele Varacca which is a good set of notes for theory of concurrency:
A model of computation is a model for concurrency when it is able to represent systems as composed of independent autonomous components, possibly communicating with each other. The notion of concurrency should not be confused with the notion of parallelism. Parallel computations usually involve a central control which distributes the work among several processors. In concurrency we stress the independence of the components, and the fact that they communicate with each other. Parallelism is like ancient Egypt, where the Pharaoh decides and the slaves work. Concurrency is like modern Italy, where everybody does what they want, and all use mobile phones.
In conclusion, parallel programming is somewhat a special case of concurrency where separate entities collaborate to obtain high performance and throughput (generally).
Async and Callbacks are just a mechanism that allows the programmer to express concurrency.
Consider that well-known parallel programming design patterns such as master/worker or map/reduce are implemented by frameworks that use such lower level mechanisms (async) to implement more complex centralized interactions.
This article explains it very well: http://urda.cc/blog/2010/10/04/asynchronous-versus-parallel-programming
It has this about asynchronous programming:
Asynchronous calls are used to prevent “blocking” within an application. [Such a] call will spin-off in an already existing thread (such as an I/O thread) and do its task when it can.
this about parallel programming:
In parallel programming you still break up work or tasks, but the key differences is that you spin up new threads for each chunk of work
and this in summary:
asynchronous calls will use threads already in use by the system and parallel programming requires the developer to break the work up, spinup, and teardown threads needed.
async: Do this by yourself somewhere else and notify me when you complete(callback). By the time i can continue to do my thing.
parallel: Hire as many guys(threads) as you wish and split the job to them to complete quicker and let me know(callback) when you complete. By the time i might continue to do my other stuff.
the main difference is parallelism mostly depends on hardware.
My basic understanding is:
Asynchonous programming solves the problem of waiting around for an expensive operation to complete before you can do anything else. If you can get other stuff done while you're waiting for the operation to complete then that's a good thing. Example: keeping a UI running while you go and retrieve more data from a web service.
Parallel programming is related but is more concerned with breaking a large task into smaller chunks that can be computed at the same time. The results of the smaller chunks can then be combined to produce the overall result. Example: ray-tracing where the colour of individual pixels is essentially independent.
It's probably more complicated than that, but I think that's the basic distinction.
I tend to think of the difference in these terms:
Asynchronous: Go away and do this task, when you're finished come back and tell me and bring the results. I'll be getting on with other things in the mean time.
Parallel: I want you to do this task. If it makes it easier, get some folks in to help. This is urgent though, so I'll wait here until you come back with the results. I can do nothing else until you come back.
Of course an asynchronous task might make use of parallelism, but the differentiation - to my mind at least - is whether you get on with other things while the operation is being carried out or if you stop everything completely until the results are in.
It is a question of order of execution.
If A is asynchronous with B, then I cannot predict beforehand when subparts of A will happen with respect to subparts of B.
If A is parallel with B, then things in A are happening at the same time as things in B. However, an order of execution may still be defined.
Perhaps the difficulty is that the word asynchronous is equivocal.
I execute an asynchronous task when I tell my butler to run to the store for more wine and cheese, and then forget about him and work on my novel until he knocks on the study door again. Parallelism is happening here, but the butler and I are engaged in fundamentally different tasks and of different social classes, so we don't apply that label here.
My team of maids is working in parallel when each of them is washing a different window.
My race car support team is asynchronously parallel in that each team works on a different tire and they don't need to communicate with each other or manage shared resources while they do their job.
My football (aka soccer) team does parallel work as each player independently processes information about the field and moves about on it, but they are not fully asynchronous because they must communicate and respond to the communication of others.
My marching band is also parallel as each player reads music and controls their instrument, but they are highly synchronous: they play and march in time to each other.
A cammed gatling gun could be considered parallel, but everything is 100% synchronous, so it is as though one process is moving forward.
Why Asynchronous ?
With today's application's growing more and more connected and also potentially
long running tasks or blocking operations such as Network I/O or Database Operations.So it's very important to hide the latency of these operations by starting them in background and returning back to the user interface quickly as possible. Here Asynchronous come in to the picture, Responsiveness.
Why parallel programming?
With today's data sets growing larger and computations growing more complex. So it's very important to reduce the execution time of these CPU-bound operations, in this case, by dividing the workload into chunks and then executing those chunks simultaneously. We can call this as "Parallel" .
Obviously it will give high Performance to our application.
Asynchronous
Let's say you are the point of contact for your client and you need to be responsive i.e. you need to share status, complexity of operation, resources required etc whenever asked. Now you have a time-consuming operation to be done and hence cannot take this up as you need to be responsive to the client 24/7. Hence, you delegate the time-consuming operation to someone else so that you can be responsive. This is asynchronous.
Parallel programming
Let's say you have a task to read, say, 100 lines from a text file, and reading one line takes 1 second. Hence, you'll require 100 seconds to read the text file. Now you're worried that the client must wait for 100 seconds for the operation to finish. Hence you create 9 more clones and make each of them read 10 lines from the text file. Now the time taken is only 10 seconds to read 100 lines. Hence you have better performance.
To sum up, asynchronous coding is done to achieve responsiveness and parallel programming is done for performance.
Asynchronous: Running a method or task in background, without blocking. May not necessorily run on a separate thread. Uses Context Switching / time scheduling.
Parallel Tasks: Each task runs parallally. Does not use context switching / time scheduling.
I came here fairly comfortable with the two concepts, but with something not clear to me about them.
After reading through some of the answers, I think I have a correct and helpful metaphor to describe the difference.
If you think of your individual lines of code as separate but ordered playing cards (stop me if I am explaining how old-school punch cards work), then for each separate procedure written, you will have a unique stack of cards (don't copy & paste!) and the difference between what normally goes on when run code normally and asynchronously depends on whether you care or not.
When you run the code, you hand the OS a set of single operations (that your compiler or interpreter broke your "higher" level code into) to be passed to the processor. With one processor, only one line of code can be executed at any one time. So, in order to accomplish the illusion of running multiple processes at the same time, the OS uses a technique in which it sends the processor only a few lines from a given process at a time, switching between all the processes according to how it sees fit. The result is multiple processes showing progress to the end user at what seems to be the same time.
For our metaphor, the relationship is that the OS always shuffles the cards before sending them to the processor. If your stack of cards doesn't depend on another stack, you don't notice that your stack stopped getting selected from while another stack became active. So if you don't care, it doesn't matter.
However, if you do care (e.g., there are multiple processes - or stacks of cards - that do depend on each other), then the OS's shuffling will screw up your results.
Writing asynchronous code requires handling the dependencies between the order of execution regardless of what that ordering ends up being. This is why constructs like "call-backs" are used. They say to the processor, "the next thing to do is tell the other stack what we did". By using such tools, you can be assured that the other stack gets notified before it allows the OS to run any more of its instructions. ("If called_back == false: send(no_operation)" - not sure if this is actually how it is implemented, but logically, I think it is consistent.)
For parallel processes, the difference is that you have two stacks that don't care about each other and two workers to process them. At the end of the day, you may need to combine the results from the two stacks, which would then be a matter of synchronicity but, for execution, you don't care again.
Not sure if this helps but, I always find multiple explanations helpful. Also, note that asynchronous execution is not constrained to an individual computer and its processors. Generally speaking, it deals with time, or (even more generally speaking) an order of events. So if you send dependent stack A to network node X and its coupled stack B to Y, the correct asynchronous code should be able to account for the situation as if it was running locally on your laptop.
Generally, there are only two ways you can do more than one thing each time. One is asynchronous, the other is parallel.
From the high level, like the popular server NGINX and famous Python library Tornado, they both fully utilize asynchronous paradigm which is Single thread server could simultaneously serve thousands of clients (some IOloop and callback). Using ECF(exception control follow) which could implement the asynchronous programming paradigm. so asynchronous sometimes doesn't really do thing simultaneous, but some io bound work, asynchronous could really promotes the performance.
The parallel paradigm always refers multi-threading, and multiprocessing. This can fully utilize multi-core processors, do things really simultaneously.
Summary of all above answers
parallel computing:
▪ solves throughput issue.
Concerned with breaking a large task into smaller chunks
▪ is machine related (multi machine/core/cpu/processor needed), eg: master slave, map reduce.
Parallel computations usually involve a central control which distributes the work among several processors
asynchronous:
▪ solves latency issue
ie, the problem of 'waiting around' for an expensive operation to complete before you can do anything else
▪ is thread related (multi thread needed)
Threading (using Thread, Runnable, Executor) is one fundamental way to perform asynchronous operations in Java

Resources