This question is to understand how event loop calls thread pool to process task.
Say,
I want to create a function (say to process small task) not any i/o operation, i want that to process using a callback function, so that it can call thread pool and task can be concurrent with my main thread, and return result in callback after completion. I have understanding that it can be done by creating child processes(forking etc),
but, I am little confused and want to understand how exactly is process executes concurrently in single threaded node in i/o operation and not in user defined operation. What exactly happens in event loop, will all event be passed to thread pool or how it identifies if it is I/O operation??
I am new at node.js and totally confused.
Help would be appreciated :)
“Node.js manages its own threads for I/O” by using libuv for operations involving the network, file system, etc. libuv essentially creates a thread pool for I/O that varies in size based on platform. The V8 event loop is a separate thread that processes events in the queue. Those events map to a JavaScript function to execute with the event data. This is how asynchronous I/O is handled by Node.js.
Source: http://www.wintellect.com/blogs/dbanister/stop-fighting-node.js-in-the-enterprise
So each I/O operation executes outside V8 event loop thread, that's why it runs concurrently.
I/O operations run efficiently because, as you mentioned, a thread pool is used - a group of threads that "wait" for incoming tasks from V8 event loop, execute them, and return data to JavaScript callback functions.
As you already stated Node runtime is single threaded.
Node is well suited for IO bound operation. It's less recommended for CPU bound because it will block Node's event loop.
If you really want to do CPU bound work async with node you can achieve that using a nodes cluster but you'll have to manage the communication between them. (A simple example here - http://davidherron.com/blog/2014-07-03/easily-offload-your-cpu-intensive-nodejs-code-simple-express-based-rest-server) or using chiled process - http://nodejs.org/api/child_process.html
Related
Like many before, I came across this diagram describing the architecture of NodeJS and since I'm new to the concept of Asynchronous programming, let me first share with you my (possibly flawed) understanding of Node's architecture.
To my knowledge, the main application script is first compiled into binary code using Chrome's V8 Engine. After this it moves through Node.JS bindings, which is a low-level API that allows the binary code to be handled by the event mechanism. Then, a single thread is allocated to the event loop, which loops infinitely, continually picks up the first (i.e. the oldest) event in the event queue and assigns a worker thread to process the event. After that, the callback is stored in the event queue, moved to a worker thread by the event-loop thread, and - depending on whether or not the callback function had another nested callback function - is either done or executes any of the callback functions that have not yet been processed.
Now here's what I don't get. The event-loop is able to continually assign events to worker threads, but the code that the worker threads have to process is still CPU blocking and the amount of worker threads is still limited. In a synchronous process, wouldn't it be able to assign different pieces of code to different worker threads on the server's CPU?
Let's use an example:
var fs = require('fs');
fs.readFile('text.txt', function(err, data) {
if(err) {
console.log(err);
} else {
console.log(data.toString());
}
});
console.log('This will probably be finished first.');
This example will log 'This will probably be finished first' and then output the data of the text.txt file later, since it's the callback function of the fs.readFile() function. Now I understand that NodeJS has a non-blocking architecture since the second block of code is finished earlier than the first even though it was called in a later stage. However, the total amount of time it takes for the program to be finished would still be the addition of the time it takes for each function to finish, right?
The only answer I can think of is that asynchronous programming allows for multithreading whereas synchronous programming does not. Otherwise, asynchronous event handling wouldn't actually be faster than synchronous programming, right?
Thanks in advance for your replies.
In Documentation, Dart is Single Threaded but to perform two operations at a time we use future objects which work same as thread.
Use Future objects (futures) to perform asynchronous operations.
If Dart is single threaded then why it allows to perform asynchronous operations.
Note: Asynchronous operations are parallel operations which are called threads
You mentioned that :
Asynchronous operations are parallel operations which are called threads
First of all, Asynchronous operations are not exactly parallel or even concurrent. Its just simply means that we do not want to block our flow of execution(Thread) or wait for the response until certain work is done. But the way we implement Asynchronous operations could decide either it is parallel or concurrent.
Parallellism vs Concurrency ?
Parallelism is actually doing lots of things simultaneously at the
same time. ex - You are walking and at the same time you're digesting
you food. Both tasks are completely running parallel and exactly at the
same time.
While
Concurrency is the illusion of Parallelism.Tasks seems to be Executed
parallel but they aren't. It like handing lots of things at a time but
only doing one task at a specific time. ex - You are walking and suddenly stop to tie your show lace. After tying your shoe lace you again start walking.
Now coming to Dart, Future Objects along with async and await keywords are used to perform asynchronous task. Here asynchronous doesn't means that tasks will be executed parallel or concurrent to each other. Instead in Dart even the asynchronous task is executed on the same thread which means that while we wait for another task to be completed, we will continue executing our synchronous code . Future Objects are used to represent the result of task which will be done at some time in future.
If you want to really execute your task concurrently then consider using Isolates(Which runs in separate thread and doesn't shares it memory with the main thread(or spawning thread).
Why? Because it is a necessity. Some operations, like http requests or timers, are asynchronous in nature.
There are isolates which allow you to execute code in a different process. The difference to threads in other programming languages is that isolates do not share memory with each other (which would lead to concurrency issues), they only communicate through messages.
To receive these messages (or wrapped in a Future, the result of it), Dart uses an event loop.
The Event Loop and Dart
Are Futures in Dart threads?
Dart is single threaded, but it can call native code(like c/c++) to perform asynchronous operations, which can introduce new thread.
In Flutter, Flutter engine is implement in c++, which provide the low-level implementation of Flutter’s core API, including asynchronous tasks like file and network I/O through new thread underneath.
Like Dart, JavaScript is also single threaded, I find this video very helpful to understand "Single Threaded" thing. what the heck is event loop
Here are a few notes:
Asynchronous doesn't mean multi-threaded. It means the code is not run at the same time. Usually asyncronous just means that it is scheduled to be run on the same thread (Isolate) after other tasks have finished.
Dart isn't actually single threaded. You can create another thread by creating another Isolate. However, within an Isolate the Dart code runs on a single thread and separate Isolates don't share memory. They can only communicate by messages.
A Future says that a value (or an error) will be returned at some point in the future. It doesn't say which thread the work is done on. Most futures are done on the current Isolate, but some futures (IO, for example) can be done on separate threads.
See this answer for links to more resources.
I have an article explaining this https://medium.com/#truongsinh/flutter-dart-async-concurrency-demystify-1cc739aaae57
In short, Flutter/Dart is not technically single-threaded, even though Dart code is executed in a single thread. Dart is a concurrent language with message passing pattern, that can take full advantage of modern multi-core architecture, without worrying about lock or mutex. Blocking in Dart can be either I/O-bound or CPU-bound, which should be solved, respectively, by Future and Dart’s Isolate/Flutter’s compute.
sqs allow MaxNumberOfMessages = 10
("The maximum number of messages to return. Amazon SQS never returns more messages than this value but may return fewer. ")
to fetch messages at once, So is there any way we can run multiple parallel processes
in nodejs which can handle many sqs messages.
Any npm package available for that?
Async might not be the right option as parallel operations are really not run parallel on them.
https://github.com/caolan/async#paralleltasks-callback
parallel(tasks, [callback])
Run the tasks collection of functions in parallel, without waiting until the previous function has completed. If any of the functions pass an error to its callback, the main callback is immediately called with the value of the error. Once the tasks have completed, the results are passed to the final callback as an array.
Note: parallel is about kicking-off I/O tasks in parallel, not about parallel execution of code. If your tasks do not use any timers or perform any I/O, they will actually be executed in series. Any synchronous setup sections for each task will happen one after the other. JavaScript remains single-threaded.
It is also possible to use an object instead of an array. Each property will be run as a function and the results will be passed to the final callback as an object instead of an array. This can be a more readable way of handling results from parallel.
But you can use background threads , worker threads to run these tasks in parallel,But not sure if this would solve ur issue fully.
You can spin up multiple processes, but node is meant for leveraging the max out of the available core with a single processes. Creating more processes will not necessarily make the overall throughput much higher.
If you have a multicore machine, generally it is advisable to have one process per core.
The AWS Javascript SDK for SQS works asynchronously, i.e the process will continue to fetch more messages when I/O is happening for the first fetch.
Unless you are making the process synchronous by waiting, the process will retrieve messages from SQS continuously.
Following up on ideas from these two previous questions I had:
When a goroutine blocks on I/O how does the scheduler identify that it has stopped blocking?
When doing asynchronous I/O, how does the kernel determine if an I/O operation is completed?
I've been looking into nodejs recently. It's advertised as "single threaded", which is partially true since all your JS does run on one thread, but from what I've read, in the background, node achieves this by delegating the I/O tasks to the kernel so that it doesn't get stuck having to wait for the response.
What I'm having difficulty understanding is how this is any different than the paradigms where you explicitly are creating a thread per request.
Could someone explain the differences in depth?
This would be true if node created one thread for each I/O request. But, of course, it doesn't do that. It has an I/O engine that understands the best way to do I/O on each platform.
What nodejs hides from you is not some naive implementation where a scheduling entity waits for each request to complete, but a sophisticated implementation that understands the optimal way to do I/O on every platform on which it is implemented.
Updates:
If both approaches need the kernel for I/O aren't they both creating a kernel thread per request?
No. There are lots of ways to use the kernel for I/O that don't require a kernel thread per request. They differ from platform to platform. Windows has IOCP. Linux has epoll. And so on.
If nodejs somehow is using a fixed amount of threads and queueing the I/O operations, isn't that slower than a thread per request?
No, it's typically much faster for a variety of reasons that depend on the specifics of each platform. Here are a few advantages:
You can avoid "thundering herds" when lots of I/O completes at once. Instead, you can wake just the number of threads that can usefully run at the same time.
You can avoid needing lots of contexts switches to get all the different threads to execute. Instead, each thread can handle completion after completion.
You don't have to put each thread on a wait queue for each I/O operation. Instead, you can use a single wait queue for the group of threads.
Just to give you an idea of how significant it can be, consider the difference between using a thread per I/O and using epoll on Linux. If you use a thread per I/O, that means each I/O operation requires a thread to place itself on a wait queue, that thread to block, that thread to be unblocked, a context switch to occur to that thread, and that thread to remove itself from the wait queue.
By contrast, with epoll, a single thread can service any number of I/O completions without having to be rescheduled or added to or removed from a wait queue for each I/O. Similarly, a thread can issue a number of I/O requests without being descheduled. This difference is massive.
Does an asynchronous call always create a new thread? What is the difference between the two?
Does an asynchronous call always create or use a new thread?
Wikipedia says:
In computer programming, asynchronous events are those occurring independently of the main program flow. Asynchronous actions are actions executed in a non-blocking scheme, allowing the main program flow to continue processing.
I know async calls can be done on single threads? How is this possible?
Whenever the operation that needs to happen asynchronously does not require the CPU to do work, that operation can be done without spawning another thread. For example, if the async operation is I/O, the CPU does not have to wait for the I/O to complete. It just needs to start the operation, and can then move on to other work while the I/O hardware (disk controller, network interface, etc.) does the I/O work. The hardware lets the CPU know when it's finished by interrupting the CPU, and the OS then delivers the event to your application.
Frequently higher-level abstractions and APIs don't expose the underlying asynchronous API's available from the OS and the underlying hardware. In those cases it's usually easier to create threads to do asynchronous operations, even if the spawned thread is just waiting on an I/O operation.
If the asynchronous operation requires the CPU to do work, then generally that operation has to happen in another thread in order for it to be truly asynchronous. Even then, it will really only be asynchronous if there is more than one execution unit.
This question is darn near too general to answer.
In the general case, an asynchronous call does not necessarily create a new thread. That's one way to implement it, with a pre-existing thread pool or external process being other ways. It depends heavily on language, object model (if any), and run time environment.
Asynchronous just means the calling thread doesn't sit and wait for the response, nor does the asynchronous activity happen in the calling thread.
Beyond that, you're going to need to get more specific.
No, asynchronous calls do not always involve threads.
They typically do start some sort of operation which continues in parallel with the caller. But that operation might be handled by another process, by the OS, by other hardware (like a disk controller), by some other computer on the network, or by a human being. Threads aren't the only way to get things done in parallel.
JavaScript is single-threaded and asynchronous. When you use XmlHttpRequest, for example, you provide it with a callback function that will be executed asynchronously when the response returns.
John Resig has a good explanation of the related issue of how timers work in JavaScript.
Multi threading refers to more than one operation happening in the same process. While async programming spreads across processes. For example if my operations calls a web service, The thread need not wait till the web service returns. Here we use async programming which allows the thread not wait for a process in another machine to complete. And when it starts getting response from the webservice it can interrupt the main thread to say that web service has completed processing the request. Now the main thread can process the result.
Windows always had asynchronous processing since the non preemptive times (versions 2.13, 3.0, 3.1, etc) using the message loop, way before supporting real threads. So to answer your question, no, it is not necessary to create a thread to perform asynchronous processing.
Asynchronous calls don't even need to occur on the same system/device as the one invoking the call. So if the question is, does an asynchronous call require a thread in the current process, the answer is no. However, there must be a thread of execution somewhere processing the asynchronous request.
Thread of execution is a vague term. In a cooperative tasking systems such as the early Macintosh and Windows OS'es, the thread of execution could simply be the same process that made the request running another stack, instruction pointer, etc... However, when people generally talk about asynchronous calls, they typically mean calls that are handled by another thread if it is intra-process (i.e. within the same process) or by another process if it is inter-process.
Note that inter-process (or interprocess) communication (IPC) is commonly generalized to include intra-process communication, since the techniques for locking, and synchronizing data are usually the same regardless of what process the separate threads of execution run in.
Some systems allow you to take advantage of the concurrency in the kernel for some facilities using callbacks. For a rather obscure instance, asynchronous IO callbacks were used to implement non-blocking internet severs back in the no-preemptive multitasking days of Mac System 6-8.
This way you have concurrent execution streams "in" you program without threads as such.
Asynchronous just means that you don't block your program waiting for something (function call, device, etc.) to finish. It can be implemented in a separate thread, but it is also common to use a dedicated thread for synchronous tasks and communicate via some kind of event system and thus achieve asynchronous-like behavior.
There are examples of single-threaded asynchronous programs. Something like:
...do something
...send some async request
while (not done)
...do something else
...do async check for results
The nature of asynchronous calls is such that, if you want the application to continue running while the call is in progress, you will either need to spawn a new thread, or at least utilise another thread you that you have created solely for the purposes of handling asynchronous callbacks.
Sometimes, depending on the situation, you may want to invoke an asynchronous method but make it appear to the user to be be synchronous (i.e. block until the asynchronous method has signalled that it is complete). This can be achieved through Win32 APIs such as WaitForSingleObject.