Async funciton for CPU-specific operation - multithreading

Things like IO / networking operations can be executed asynchronously because they have their own hardware / drivers and because of that, no separate thread has to be created.
But what happens if the asynchronous function contains something that can only be executed by the CPU?
Can this situation be handled without creating a new thread?

Related

Async and scheduling - how do libraries avoid blocking at the lowest level?

I've been using various concurrency constructs for a while now without much consideration for how all the magic happens, which has recently made me increasingly uneasy.
In an attempt to remedy this ... feeling, I have been reading up on how async works under the hood. When I say async, in this case I'm referring to userland / greenthread / cooperative multitasking, although I assume some of the concepts will also apply to traditional OS managed threads insofar as a scheduler and workers are involved.
I see how a worker can suspend itself and let other workers execute, but at the lowest level in non-blocking library code, how does the scheduler know when a previously suspended worker's job is done and to wake up that worker?
For example if you fire up a worker in some sort of async block and perform an operation that would normally block (e.g. HTTP request, SQL query, other I/O), then even though your calling code is async, that operation (library code) better play nice with your async framework or you've effectively defeated the purpose of using it and blocked your scheduler from calling other waiting operations (the, What Color is Your Function problem) while it waits for your blocking call, which was executed inside your non-blocking calling code, to complete.
So now we've got async code calling other async library code, and now I'm asking myself the question all over again - how does the async library code know when to suspend and resume operation?
The idea of firing off a HTTP request, moving on, and returning later to check for results is weird to think about for me - not conceptually but from an implementation standpoint.
How do you perform a partial operation, e.g. sending TCP packets and then continuing with the rest of the program execution, only to come back later and check if results have been delivered. Delivered to what? A socket?
Now we're another layer deep and you are using socket selects to avoid creating threads and blocking, but, again...
how do those sockets start an operation, move on before completion, and then how does select know when data is available?
Are you continually checking some buffer to see if bytes have been delivered in an infinite loop and moving on if not?
Anyhow - I think you see where I'm going here....
I focused mainly on HTTP as a motivating example, but the same question applies for any normally blocking operations - how does it all work at the bottom?
Here are some of the resources I found helpful while researching the topic and which informed this question:
David Beazley's excellent video Build Your Own Async where he walks you through a simple implementation of a scheduler which fire callbacks and suspend execution by sleeping on a waiting queue. I found this video tremendously instructive, but it stops a bit short in that it shows you how using an async sleep frees up the scheduler to execute other workers, but doesn't really go into what would happen when you call code in those workers that itself must be non-blocking so it plays nice with the scheduler.
How does non-blocking IO work under the hood - This got me further along in my understanding, but still left with a few uncertainties.

If blocking operations are asynchronously handled and results in a queue of blocking processes, how does asynchronous programming speed things up?

Like many before, I came across this diagram describing the architecture of NodeJS and since I'm new to the concept of Asynchronous programming, let me first share with you my (possibly flawed) understanding of Node's architecture.
To my knowledge, the main application script is first compiled into binary code using Chrome's V8 Engine. After this it moves through Node.JS bindings, which is a low-level API that allows the binary code to be handled by the event mechanism. Then, a single thread is allocated to the event loop, which loops infinitely, continually picks up the first (i.e. the oldest) event in the event queue and assigns a worker thread to process the event. After that, the callback is stored in the event queue, moved to a worker thread by the event-loop thread, and - depending on whether or not the callback function had another nested callback function - is either done or executes any of the callback functions that have not yet been processed.
Now here's what I don't get. The event-loop is able to continually assign events to worker threads, but the code that the worker threads have to process is still CPU blocking and the amount of worker threads is still limited. In a synchronous process, wouldn't it be able to assign different pieces of code to different worker threads on the server's CPU?
Let's use an example:
var fs = require('fs');
fs.readFile('text.txt', function(err, data) {
if(err) {
console.log(err);
} else {
console.log(data.toString());
}
});
console.log('This will probably be finished first.');
This example will log 'This will probably be finished first' and then output the data of the text.txt file later, since it's the callback function of the fs.readFile() function. Now I understand that NodeJS has a non-blocking architecture since the second block of code is finished earlier than the first even though it was called in a later stage. However, the total amount of time it takes for the program to be finished would still be the addition of the time it takes for each function to finish, right?
The only answer I can think of is that asynchronous programming allows for multithreading whereas synchronous programming does not. Otherwise, asynchronous event handling wouldn't actually be faster than synchronous programming, right?
Thanks in advance for your replies.

does user defined callback function uses thread pool in node.js?

This question is to understand how event loop calls thread pool to process task.
Say,
I want to create a function (say to process small task) not any i/o operation, i want that to process using a callback function, so that it can call thread pool and task can be concurrent with my main thread, and return result in callback after completion. I have understanding that it can be done by creating child processes(forking etc),
but, I am little confused and want to understand how exactly is process executes concurrently in single threaded node in i/o operation and not in user defined operation. What exactly happens in event loop, will all event be passed to thread pool or how it identifies if it is I/O operation??
I am new at node.js and totally confused.
Help would be appreciated :)
“Node.js manages its own threads for I/O” by using libuv for operations involving the network, file system, etc. libuv essentially creates a thread pool for I/O that varies in size based on platform. The V8 event loop is a separate thread that processes events in the queue. Those events map to a JavaScript function to execute with the event data. This is how asynchronous I/O is handled by Node.js.
Source: http://www.wintellect.com/blogs/dbanister/stop-fighting-node.js-in-the-enterprise
So each I/O operation executes outside V8 event loop thread, that's why it runs concurrently.
I/O operations run efficiently because, as you mentioned, a thread pool is used - a group of threads that "wait" for incoming tasks from V8 event loop, execute them, and return data to JavaScript callback functions.
As you already stated Node runtime is single threaded.
Node is well suited for IO bound operation. It's less recommended for CPU bound because it will block Node's event loop.
If you really want to do CPU bound work async with node you can achieve that using a nodes cluster but you'll have to manage the communication between them. (A simple example here - http://davidherron.com/blog/2014-07-03/easily-offload-your-cpu-intensive-nodejs-code-simple-express-based-rest-server) or using chiled process - http://nodejs.org/api/child_process.html

What is the difference between asynchronous I/O and asynchronous function?

Node.js, is an asynchronous I/O. what does this actually mean?
Is there different between I create an async function by spawning another thread to do the process?
e.g.
void asyncfuntion(){
Thread apple = new Thread(){
public void run(){
...do stuff
}
}
apple.start()
}
If there is difference, can I do a asynchronous I/O in javascript?
Asynchronous I/O
Asynchronous I/O (from Wikipedia)
Asynchronous I/O, or non-blocking I/O, is a form of input/output
processing that permits other processing to continue before the
transmission has finished.
What this means is, if a process wants to do a read() or write(), in a synchronous call, the process would have to wait until the hardware finishes the physical I/O so that it can be informed of the success/failure of the I/O operation.
On asynchronous mode, once the process issues a read/write I/O asynchronously, the system calls is returned immediately once the I/O has been passed down to the hardware or queued in the OS/VM. Thus the execution of the process isn't blocked (hence why it's called non-blocking I/O) since it doesn't need to wait for the result from the system call, it will receive the result later.
Asynchronous Function
Asynchronous functions is a function that returns the data back to the caller by a means of event handler (or callback functions). The callback function can be called at any time (depending on how long it takes the asynchronous function to complete). This is unlike the synchronous function, which will execute its instructions before returning a value.
...can I do a asynchronous I/O in java?
Yes, Java NIO provides non-blocking I/O support via Selector's. Also, Apache MINA, is a networking framework that also includes non-blocking I/O. A related SO question answers that question.
There are several great articles regarding asynchronous code in node.js:
Asynchronous Code Design with Node.js
Understanding Event-driven Programming
Understanding event loops and writing great code for Node.js
In addition to #The Elite Gentleman's answer, node doesn't spawn threads for asynchronous I/O functions. Everything under node runs in a single threaded event loop. That is why it is really important to avoid synchronized versions of some I/O functions unless absolutely necessary, like fs.readSync
You may read this excellent blog post for some insight: http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
I was investigating the same question since this async IO pattern was very new to me. I found this talk on infoq.com which made me very happy. The guy explains very well where async IO actually resides (the OS -> the kernel) and how it is embedded in node.js as the primary idiom for doing IO. Enjoy!
http://www.infoq.com/presentations/Nodejs-Asynchronous-IO-for-Fun-and-Profit
node.js enables a programmer to do asynchronous IO by forcing the usage of callbacks. Now callbacks are similar to the old asynchronous functions that we have been using for a long time to handle DOM events in javascript!
e.g.
asyncIOReadFile(fileName, asynFuncReadTheActualContentsAndDoSomethingWithThem);
console.log('IDontKnowWhatIsThereInTheFileYet')
function asynFuncReadTheActualContentsAndDoSomethingWithThem(fileContents) {
console.log('Now I know The File Contents' + fileContents)
}
//Generally the output of the above program will be following
'IDontKnowWhatIsThereInTheFileYet'
//after quite a bit of time
'Now I know The File Contents somebinarystuff'
From https://en.wikipedia.org/wiki/Asynchronous_I/O
Synchronous blocking I/O
A simple approach to I/O would be to start the access and then wait for it to complete.
Would block the progress of a program while the communication is in progress, leaving system resources idle.
Asynchronous I/O
Alternatively, it is possible to start the communication and then perform processing that does not require that the I/O be completed.
Any task that depends on the I/O having completed ... still needs to wait for the I/O operation to complete, and thus is still blocked,
but other processing that does not have a dependency on the I/O operation can continue.
Example:
https://en.wikipedia.org/wiki/Node.js
Node.js has an event-driven architecture capable of asynchronous I/O

Asynchronous vs Multithreading - Is there a difference?

Does an asynchronous call always create a new thread? What is the difference between the two?
Does an asynchronous call always create or use a new thread?
Wikipedia says:
In computer programming, asynchronous events are those occurring independently of the main program flow. Asynchronous actions are actions executed in a non-blocking scheme, allowing the main program flow to continue processing.
I know async calls can be done on single threads? How is this possible?
Whenever the operation that needs to happen asynchronously does not require the CPU to do work, that operation can be done without spawning another thread. For example, if the async operation is I/O, the CPU does not have to wait for the I/O to complete. It just needs to start the operation, and can then move on to other work while the I/O hardware (disk controller, network interface, etc.) does the I/O work. The hardware lets the CPU know when it's finished by interrupting the CPU, and the OS then delivers the event to your application.
Frequently higher-level abstractions and APIs don't expose the underlying asynchronous API's available from the OS and the underlying hardware. In those cases it's usually easier to create threads to do asynchronous operations, even if the spawned thread is just waiting on an I/O operation.
If the asynchronous operation requires the CPU to do work, then generally that operation has to happen in another thread in order for it to be truly asynchronous. Even then, it will really only be asynchronous if there is more than one execution unit.
This question is darn near too general to answer.
In the general case, an asynchronous call does not necessarily create a new thread. That's one way to implement it, with a pre-existing thread pool or external process being other ways. It depends heavily on language, object model (if any), and run time environment.
Asynchronous just means the calling thread doesn't sit and wait for the response, nor does the asynchronous activity happen in the calling thread.
Beyond that, you're going to need to get more specific.
No, asynchronous calls do not always involve threads.
They typically do start some sort of operation which continues in parallel with the caller. But that operation might be handled by another process, by the OS, by other hardware (like a disk controller), by some other computer on the network, or by a human being. Threads aren't the only way to get things done in parallel.
JavaScript is single-threaded and asynchronous. When you use XmlHttpRequest, for example, you provide it with a callback function that will be executed asynchronously when the response returns.
John Resig has a good explanation of the related issue of how timers work in JavaScript.
Multi threading refers to more than one operation happening in the same process. While async programming spreads across processes. For example if my operations calls a web service, The thread need not wait till the web service returns. Here we use async programming which allows the thread not wait for a process in another machine to complete. And when it starts getting response from the webservice it can interrupt the main thread to say that web service has completed processing the request. Now the main thread can process the result.
Windows always had asynchronous processing since the non preemptive times (versions 2.13, 3.0, 3.1, etc) using the message loop, way before supporting real threads. So to answer your question, no, it is not necessary to create a thread to perform asynchronous processing.
Asynchronous calls don't even need to occur on the same system/device as the one invoking the call. So if the question is, does an asynchronous call require a thread in the current process, the answer is no. However, there must be a thread of execution somewhere processing the asynchronous request.
Thread of execution is a vague term. In a cooperative tasking systems such as the early Macintosh and Windows OS'es, the thread of execution could simply be the same process that made the request running another stack, instruction pointer, etc... However, when people generally talk about asynchronous calls, they typically mean calls that are handled by another thread if it is intra-process (i.e. within the same process) or by another process if it is inter-process.
Note that inter-process (or interprocess) communication (IPC) is commonly generalized to include intra-process communication, since the techniques for locking, and synchronizing data are usually the same regardless of what process the separate threads of execution run in.
Some systems allow you to take advantage of the concurrency in the kernel for some facilities using callbacks. For a rather obscure instance, asynchronous IO callbacks were used to implement non-blocking internet severs back in the no-preemptive multitasking days of Mac System 6-8.
This way you have concurrent execution streams "in" you program without threads as such.
Asynchronous just means that you don't block your program waiting for something (function call, device, etc.) to finish. It can be implemented in a separate thread, but it is also common to use a dedicated thread for synchronous tasks and communicate via some kind of event system and thus achieve asynchronous-like behavior.
There are examples of single-threaded asynchronous programs. Something like:
...do something
...send some async request
while (not done)
...do something else
...do async check for results
The nature of asynchronous calls is such that, if you want the application to continue running while the call is in progress, you will either need to spawn a new thread, or at least utilise another thread you that you have created solely for the purposes of handling asynchronous callbacks.
Sometimes, depending on the situation, you may want to invoke an asynchronous method but make it appear to the user to be be synchronous (i.e. block until the asynchronous method has signalled that it is complete). This can be achieved through Win32 APIs such as WaitForSingleObject.

Resources