how does async.parallel work in nodejs with a single thread - node.js

The async library in nodejs provides many methods like parallel and times to execute multiple calls in parallel. How can this be really parallel since nodejs is single threaded and the event loop can pick only one job at a time.
Does this mean async.parallel is not really parallel and only asynchronous ? I understand that both asynchronous and parallel are totally different terms.

Related

Dart is Single Threaded but why it uses Future Objects and perform asynchronous operations

In Documentation, Dart is Single Threaded but to perform two operations at a time we use future objects which work same as thread.
Use Future objects (futures) to perform asynchronous operations.
If Dart is single threaded then why it allows to perform asynchronous operations.
Note: Asynchronous operations are parallel operations which are called threads
You mentioned that :
Asynchronous operations are parallel operations which are called threads
First of all, Asynchronous operations are not exactly parallel or even concurrent. Its just simply means that we do not want to block our flow of execution(Thread) or wait for the response until certain work is done. But the way we implement Asynchronous operations could decide either it is parallel or concurrent.
Parallellism vs Concurrency ?
Parallelism is actually doing lots of things simultaneously at the
same time. ex - You are walking and at the same time you're digesting
you food. Both tasks are completely running parallel and exactly at the
same time.
While
Concurrency is the illusion of Parallelism.Tasks seems to be Executed
parallel but they aren't. It like handing lots of things at a time but
only doing one task at a specific time. ex - You are walking and suddenly stop to tie your show lace. After tying your shoe lace you again start walking.
Now coming to Dart, Future Objects along with async and await keywords are used to perform asynchronous task. Here asynchronous doesn't means that tasks will be executed parallel or concurrent to each other. Instead in Dart even the asynchronous task is executed on the same thread which means that while we wait for another task to be completed, we will continue executing our synchronous code . Future Objects are used to represent the result of task which will be done at some time in future.
If you want to really execute your task concurrently then consider using Isolates(Which runs in separate thread and doesn't shares it memory with the main thread(or spawning thread).
Why? Because it is a necessity. Some operations, like http requests or timers, are asynchronous in nature.
There are isolates which allow you to execute code in a different process. The difference to threads in other programming languages is that isolates do not share memory with each other (which would lead to concurrency issues), they only communicate through messages.
To receive these messages (or wrapped in a Future, the result of it), Dart uses an event loop.
The Event Loop and Dart
Are Futures in Dart threads?
Dart is single threaded, but it can call native code(like c/c++) to perform asynchronous operations, which can introduce new thread.
In Flutter, Flutter engine is implement in c++, which provide the low-level implementation of Flutter’s core API, including asynchronous tasks like file and network I/O through new thread underneath.
Like Dart, JavaScript is also single threaded, I find this video very helpful to understand "Single Threaded" thing. what the heck is event loop
Here are a few notes:
Asynchronous doesn't mean multi-threaded. It means the code is not run at the same time. Usually asyncronous just means that it is scheduled to be run on the same thread (Isolate) after other tasks have finished.
Dart isn't actually single threaded. You can create another thread by creating another Isolate. However, within an Isolate the Dart code runs on a single thread and separate Isolates don't share memory. They can only communicate by messages.
A Future says that a value (or an error) will be returned at some point in the future. It doesn't say which thread the work is done on. Most futures are done on the current Isolate, but some futures (IO, for example) can be done on separate threads.
See this answer for links to more resources.
I have an article explaining this https://medium.com/#truongsinh/flutter-dart-async-concurrency-demystify-1cc739aaae57
In short, Flutter/Dart is not technically single-threaded, even though Dart code is executed in a single thread. Dart is a concurrent language with message passing pattern, that can take full advantage of modern multi-core architecture, without worrying about lock or mutex. Blocking in Dart can be either I/O-bound or CPU-bound, which should be solved, respectively, by Future and Dart’s Isolate/Flutter’s compute.

using asyncio and threads

Would it make sense to use both asyncio and threading in the same python project so that code runs in different threads where is some of them asyncio is used to get a sequentially looking code for asynchronous activities?
or would trying to do this mean that I am missing some basic concept on the usage of either threading or asyncio?
I didn't understand what you're asking (part about "sequentially looking code for asynchronous activities"), but since there's no answers I'll write some thoughts.
Let's talk why we need asyncio/threads at all. Imagine we have a task to make two requests.
If we will use plain one-thread non-async code, only option for us
is to make request for one url and only after it's done - for
another:
request(url1)
request(url2)
Problem here is that we do job ineffective: each function most time of it's execution do nothing just waiting for network results. It would be cool if we somehow would be able to use CPU for second request while first one stuck with network stuff and don't need it.
This problem can be solved (and usually solves) by running functions in different threads:
with ThreadPoolExecutor(max_workers=2) as e:
e.submit(request, url1)
e.submit(request, url2)
We would get results faster this way. While first request is stuck with network, CPU would be able to do something useful for second request in another thread.
This is however not ideal solution: switching between threads have some cost, executing flow is more complex than in the first example.
There should be way better.
Use one function idle period to start executing another function is what asyncio in general about:
await asyncio.gather(
async_request(url1),
async_request(url2),
)
Event loop manages execution flow: when first coroutine reaches some I/O operation and CPU can be used to do job elsewhere, second coroutine starts. Later event loop returns to remain executing of first coroutine.
We get "parallel" requests and clean understandable code. Since we have parallelization in single thread, we just don't need another.
Actually, when we use asyncio threads still can be useful. If we ready to pay for them, they can help us to cast synchronous I/O functions to asynchronous very quickly:
async def async_request(url):
loop = asyncio.get_event_loop()
return (await loop.run_in_executor(None, request, url))
But again, it's optional and we usually can find module to make requests (and other I/O tasks) asynchronously without threads.
I didn't face with any other tasks when threads can be useful in asynchronous programs.
Sure it may make sense.
Asynchronous code in principle runs a bunch of routines in the same thread.
This means that the moment one routine has to wait for input or output (I/O) it will halt that routine temporarily and simply starts processing another routine until it encounters a wait there, etc.
Multi-threaded (or "parallelized" code) runs in principle at the same time on different cores of your machine. (Note that in Python parallel processing is achieved by using multiple processes as pointed out by #Yassine Faris below).
It may make perfect sense to use both in the same program. Use asyncio in order to keep processing while waiting for I/O. Use multi-threading (multi processing in Python) to do, for example, heavy calculations in parallel in another part of your program.

Run parallel processes in node js to handle SQS messages

sqs allow MaxNumberOfMessages = 10
("The maximum number of messages to return. Amazon SQS never returns more messages than this value but may return fewer. ")
to fetch messages at once, So is there any way we can run multiple parallel processes
in nodejs which can handle many sqs messages.
Any npm package available for that?
Async might not be the right option as parallel operations are really not run parallel on them.
https://github.com/caolan/async#paralleltasks-callback
parallel(tasks, [callback])
Run the tasks collection of functions in parallel, without waiting until the previous function has completed. If any of the functions pass an error to its callback, the main callback is immediately called with the value of the error. Once the tasks have completed, the results are passed to the final callback as an array.
Note: parallel is about kicking-off I/O tasks in parallel, not about parallel execution of code. If your tasks do not use any timers or perform any I/O, they will actually be executed in series. Any synchronous setup sections for each task will happen one after the other. JavaScript remains single-threaded.
It is also possible to use an object instead of an array. Each property will be run as a function and the results will be passed to the final callback as an object instead of an array. This can be a more readable way of handling results from parallel.
But you can use background threads , worker threads to run these tasks in parallel,But not sure if this would solve ur issue fully.
You can spin up multiple processes, but node is meant for leveraging the max out of the available core with a single processes. Creating more processes will not necessarily make the overall throughput much higher.
If you have a multicore machine, generally it is advisable to have one process per core.
The AWS Javascript SDK for SQS works asynchronously, i.e the process will continue to fetch more messages when I/O is happening for the first fetch.
Unless you are making the process synchronous by waiting, the process will retrieve messages from SQS continuously.

NodeJS: Parallelism of async module

I am following the async module's each method (https://github.com/caolan/async#each). It says the method iterates over the array parallely. "Parallely" is the word that confuses me. AFAIK, in now way JavaScript can execute code parallely because it has a single-threaded model.
The examples shown in the each method focuses on the IO scenarios. I am using the "each" method just to add numbers of the array. If parallelism exists, can I prove this using my example?
Thanks for reading.
The 'parallel' in the async documentation doesn't refer to 'parallel' in terms of concurrency (like multiple processes or threads being run at the same time), but 'parallel' in terms of each step being independent of the other steps (the opposite operation would be eachSeries, where each step is run only after the previous has finished).
The parallel version would only make sense if the steps perform some kind of I/O, which (because of Node's asynchronous nature) could run parallel to each other: if one step has to wait for I/O, the other steps can happily continue to send/receive data.
If the steps are mainly cpu-bound (that is, performing lots of calculations), it's not going to provide you any better performance because, like you say, Node runs the interpreter in a single thread, and that's not something that async changes.
Like robertklep said, it is more of concurrent instead of parallel. You are not going to achieve much performance gain by doing compute heavy code in parallel. It is useful when you have to do parallel I/O (communicating with an external web service for all the items of an array, for example).

How to avoid asynchronous waiting

I have an application where I need to query a database to get/put information. I can't do it synchronously as it would block my entire process until the function returns.
Basically I have a few functions that run one or more queries at certain points.
fun
stuff1
stuff2
stuff3
query1
stuff4
query2
stuff5
I could start the functions in separate threads, but then I would have to lock everything to prevent races (I think locking could be slow ?)
I could start the queries asynchronously and monitor them but then I would have to split my functions and use callbacks that would run when the qouery is over
I am interested in a general solution, but my platform is POSIX and the database is (unfortunately) mysql.
What would you do ? How would you handle this ?
Thank you for your time.
There are patterns that have been known and used for quite sometime and I believe they have not changed.
Running independent functions: Use different threads
Running independent functions and a then function dependent on all of them: Use different threads and joining them at the end - synchronise them. I do not know about POSIX but in .NET we have EventWaitHandle that can wait for multiple threads and notify when all finished.
Running functions that each depends on another: run on a single background thread and chain the callbacks. Again .NET offers Task<T> chaining which makes reading and writing the code much simpler. jquery now offers promise which is the same thing.
Depends on how complicated the situation is. In a simple scenario breaking the work across multiple functions which are given as callbacks to the query will work - and is a valid solution. In a more complicated scenario, you need some dependency injection framework like spring.
You can create a new thread that would handle a database queries queue. This thread would hold a list containing the next actions to perform on the database and would be accessed by a function like : MyDatabaseQueue.PerformActionWhenFree( Action a, Callback callmebackwhendone ). This thread would be responsible for creating one query thread at a time. That way you can always receive more queries in the queue and have only one database query thread at a time.
Well if the queries are tightly coupled, you can simply start a parallel thread with pthread_create and run them sequentially on that thread. Thus, your main thread won't be blocked and you still won't need to employ any locks.

Resources