Play is asynchronous or non-blocking framework - multithreading

I have read several tutorials and everywhere play is mentioned as "Non-Blocking". I am confused that by saying non-blocking do they mean asynchronous or non-blocking I/O.
As per my understanding when any http request is received on play server then it assigns a dedicated thread for processing that request and holds that thread till it returns a response. So how it is non-blocking?
Also by asynchronous do they mean that main thread can spawn a new thread and delegate its work to it and run next statement? (For that we need to use akka)

To understand this, you need to understand how Futures in scala work. Futures do not stop the execution of the program. All futures are non-blocking. I would suggest you read up on this : http://docs.scala-lang.org/overviews/core/futures.html.
Since play's asynchronous library is built on top of Scala's Futures. Once you're done with that, you will have enough knowledge to answer your own question. Also, read this if you need more information about how futures are used in play :
https://www.playframework.com/documentation/2.5.x/ScalaAsync

Related

What does it mean by " asynchronous I/O primitives" in nodejs?

I was going through Node.js documentation, and couldn't understand the line :
A Node.js app runs in a single process, without creating a new thread for every request. Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking and generally, libraries in Node.js are written using non-blocking paradigms, making blocking behavior the exception rather than the norm.
Source : Introduction to node js
I couldn't understand specifically:
[...] Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking [..]
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
If not then what are these set of asynchronous I/O primitives? If anyone could provide me some link for better understanding or getting started with Node.js, that would be really great.
P.S : I have practical experience with Nodejs where I understand how it's code will work but don't understand why it will work, so I want understand its theoretical part, so I can understand what actually is going on in the background.
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
Yes, that's basically what it means.
In a "traditional" one-thread-per-connection* model you accept a connection and then hand off the handling of that request to a thread (either a freshly started one or from a pool, doesn't really change much) and do all work related to that connection on that one thread, including sending the response.
This can easily done with synchronous/blocking I/O: have a read method that simply returns the read bytes and a write method that blocks until the writing is done.
This does, however mean that the thread handling that request can not do anything else and also that you need many threads to be able to handle many concurrent connections/requests. And since I/O operations take a relatively huge time (when measured in the speed of memory access and computation), that means that most of these threads will be waiting for one I/O operation or another for most of the time.
Having asynchronous I/O and an event-based core architecture means that when a I/O operation is initiated the CPU can immediately go on to processing whatever action needs to be done next, which will likely be related to an entirely different request.
Therefore one can handle many more requests on a single thread.
The "primitives" just means that basic I/O operations such as "read bytes" and "write bytes" to/from network connections or files are provided as asynchronous operations and higher-level operations need to be built on top of those (again in an asynchronous way, to keep the benefits).
As a side node: many other programming environments have either had asynchronous I/O APIs for a long time or have gotten them in recent years. The one thing that sets Node.js apart is that it's the default option: If you're reading from a socket or a file, then doing it asynchronously is what is "normal" and blocking calls are the big exception. This means that the entire ecosystem surrounding Node.js (i.e. almost all third-party libraries) works with that assumption in mind and is also written in that same manner.
So while Java, for example, has asynchronous I/O you lose that advantage as soon as you use any I/O related libraries that only support blocking I/O.
* I use connection/request interchangeably in this answer, under the assumption that each connection contains a single request. That assumption is usually wrong these days (most common protocols allow multiple request/response pairs in a single connnection), but handling multiple requests on a single connection doesn't fundamentally change anything about this answer.
It means node js doesn't halts on input/output operations. Suppose you need to do some task and it have some blocking condition e.g if space key is pressed then do this or while esc key isn't pressed keep taking input as node js is single threaded this will stop all operations and it will focus of doing the blocking condition job until its finishes it asyncronous will allow application not halt other tasks while doing one it will do other task until than task finishes and thats why we use await to get value out of promises in node js async function when the data is processed then the pointer will comes to the line where await line is present and get the value or process the information from it.

Async and scheduling - how do libraries avoid blocking at the lowest level?

I've been using various concurrency constructs for a while now without much consideration for how all the magic happens, which has recently made me increasingly uneasy.
In an attempt to remedy this ... feeling, I have been reading up on how async works under the hood. When I say async, in this case I'm referring to userland / greenthread / cooperative multitasking, although I assume some of the concepts will also apply to traditional OS managed threads insofar as a scheduler and workers are involved.
I see how a worker can suspend itself and let other workers execute, but at the lowest level in non-blocking library code, how does the scheduler know when a previously suspended worker's job is done and to wake up that worker?
For example if you fire up a worker in some sort of async block and perform an operation that would normally block (e.g. HTTP request, SQL query, other I/O), then even though your calling code is async, that operation (library code) better play nice with your async framework or you've effectively defeated the purpose of using it and blocked your scheduler from calling other waiting operations (the, What Color is Your Function problem) while it waits for your blocking call, which was executed inside your non-blocking calling code, to complete.
So now we've got async code calling other async library code, and now I'm asking myself the question all over again - how does the async library code know when to suspend and resume operation?
The idea of firing off a HTTP request, moving on, and returning later to check for results is weird to think about for me - not conceptually but from an implementation standpoint.
How do you perform a partial operation, e.g. sending TCP packets and then continuing with the rest of the program execution, only to come back later and check if results have been delivered. Delivered to what? A socket?
Now we're another layer deep and you are using socket selects to avoid creating threads and blocking, but, again...
how do those sockets start an operation, move on before completion, and then how does select know when data is available?
Are you continually checking some buffer to see if bytes have been delivered in an infinite loop and moving on if not?
Anyhow - I think you see where I'm going here....
I focused mainly on HTTP as a motivating example, but the same question applies for any normally blocking operations - how does it all work at the bottom?
Here are some of the resources I found helpful while researching the topic and which informed this question:
David Beazley's excellent video Build Your Own Async where he walks you through a simple implementation of a scheduler which fire callbacks and suspend execution by sleeping on a waiting queue. I found this video tremendously instructive, but it stops a bit short in that it shows you how using an async sleep frees up the scheduler to execute other workers, but doesn't really go into what would happen when you call code in those workers that itself must be non-blocking so it plays nice with the scheduler.
How does non-blocking IO work under the hood - This got me further along in my understanding, but still left with a few uncertainties.

Does Node.js actually use multiple threads underneath?

After all the literature i've read on node.js I still come back to the question, does node.js itself make use of multiple threads under the hood? I think the answer is yes because if we use the simple asynch file read example something has to be doing the work to read the file but if the main event loop of node is not processing this work than that must mean there should be a POSIX thread running somewhere that takes care of the file reading and then upon completion places the call back in the event loop to be executed. So when we say Node.js runs in one thread do we actually mean that the event loop of node.js is only one thread? Or am i missing something here.....
To a Javascript program on node.js, there is only one thread.
If you're looking for technicalities, node.js is free to use threads to solve asynchronous I/O if the underlying operating system requires it.
The important thing is to never break the "there is only one thread" abstraction to the Javascript program. If there are more threads, all they can do is queue up work for the main thread in the Javascript program, they can never execute any Javascript code themselves.

How does node.js know if a call has to be executed in the thread pool or in the event loop?

I've read that node.js uses both treads and an event loop.
I'm curious to know how does it know how to treat a call back... Is it specified by the EventEmitter (and does the engineer know if it is going to be blocking or not)?
Or is the core itself that chooses it at runtime?
If it's this one how does it detect if it has to be run async or threaded?
I've already read a lot of resources but i didn't find about this. Im reading the source code but it's hard since it is a lot of time since the last time i coded with C++.
thanks
Your JavaScript code always runs in a single thread. That's because the V8 JavaScript engine is not threadsafe.
However, as an implementation detail of some of the C++ code, there may be threads. For example, suppose you write some JavaScript code that connects to a database. Your JavaScript code will of course be async, like any good Node code. But async coding is very uncommon in the C/C++ world, so the database vendor probably didn't write an async C/C++ API.
So when someone is writing a Node package for database access, they have to write a shim that adapts between the "blocking" C++ behavior and the "non-blocking, event-driven" Node behavior. When you call, say, a "connect" method, that goes to C++ code that spawns a new thread, and that thread issues a (blocking) "connect" call to the database, which blocks the thread until the connection is done. Then the C++ code will communicate the "connection done" back to the event queue, and the next time the main (JavaScript) thread polls the event queue, your callback will fire.
So yes, there are threads, but their use should be completely transparent to you. When you're writing Node.js code in JavaScript, you don't need to worry about threads -- you just care that things happen when they're supposed to. Package authors may use threads, but that's purely an implementation detail and you should never have to worry about it. Your JavaScript code never explicitly uses threads.

Asynchronous vs Multithreading - Is there a difference?

Does an asynchronous call always create a new thread? What is the difference between the two?
Does an asynchronous call always create or use a new thread?
Wikipedia says:
In computer programming, asynchronous events are those occurring independently of the main program flow. Asynchronous actions are actions executed in a non-blocking scheme, allowing the main program flow to continue processing.
I know async calls can be done on single threads? How is this possible?
Whenever the operation that needs to happen asynchronously does not require the CPU to do work, that operation can be done without spawning another thread. For example, if the async operation is I/O, the CPU does not have to wait for the I/O to complete. It just needs to start the operation, and can then move on to other work while the I/O hardware (disk controller, network interface, etc.) does the I/O work. The hardware lets the CPU know when it's finished by interrupting the CPU, and the OS then delivers the event to your application.
Frequently higher-level abstractions and APIs don't expose the underlying asynchronous API's available from the OS and the underlying hardware. In those cases it's usually easier to create threads to do asynchronous operations, even if the spawned thread is just waiting on an I/O operation.
If the asynchronous operation requires the CPU to do work, then generally that operation has to happen in another thread in order for it to be truly asynchronous. Even then, it will really only be asynchronous if there is more than one execution unit.
This question is darn near too general to answer.
In the general case, an asynchronous call does not necessarily create a new thread. That's one way to implement it, with a pre-existing thread pool or external process being other ways. It depends heavily on language, object model (if any), and run time environment.
Asynchronous just means the calling thread doesn't sit and wait for the response, nor does the asynchronous activity happen in the calling thread.
Beyond that, you're going to need to get more specific.
No, asynchronous calls do not always involve threads.
They typically do start some sort of operation which continues in parallel with the caller. But that operation might be handled by another process, by the OS, by other hardware (like a disk controller), by some other computer on the network, or by a human being. Threads aren't the only way to get things done in parallel.
JavaScript is single-threaded and asynchronous. When you use XmlHttpRequest, for example, you provide it with a callback function that will be executed asynchronously when the response returns.
John Resig has a good explanation of the related issue of how timers work in JavaScript.
Multi threading refers to more than one operation happening in the same process. While async programming spreads across processes. For example if my operations calls a web service, The thread need not wait till the web service returns. Here we use async programming which allows the thread not wait for a process in another machine to complete. And when it starts getting response from the webservice it can interrupt the main thread to say that web service has completed processing the request. Now the main thread can process the result.
Windows always had asynchronous processing since the non preemptive times (versions 2.13, 3.0, 3.1, etc) using the message loop, way before supporting real threads. So to answer your question, no, it is not necessary to create a thread to perform asynchronous processing.
Asynchronous calls don't even need to occur on the same system/device as the one invoking the call. So if the question is, does an asynchronous call require a thread in the current process, the answer is no. However, there must be a thread of execution somewhere processing the asynchronous request.
Thread of execution is a vague term. In a cooperative tasking systems such as the early Macintosh and Windows OS'es, the thread of execution could simply be the same process that made the request running another stack, instruction pointer, etc... However, when people generally talk about asynchronous calls, they typically mean calls that are handled by another thread if it is intra-process (i.e. within the same process) or by another process if it is inter-process.
Note that inter-process (or interprocess) communication (IPC) is commonly generalized to include intra-process communication, since the techniques for locking, and synchronizing data are usually the same regardless of what process the separate threads of execution run in.
Some systems allow you to take advantage of the concurrency in the kernel for some facilities using callbacks. For a rather obscure instance, asynchronous IO callbacks were used to implement non-blocking internet severs back in the no-preemptive multitasking days of Mac System 6-8.
This way you have concurrent execution streams "in" you program without threads as such.
Asynchronous just means that you don't block your program waiting for something (function call, device, etc.) to finish. It can be implemented in a separate thread, but it is also common to use a dedicated thread for synchronous tasks and communicate via some kind of event system and thus achieve asynchronous-like behavior.
There are examples of single-threaded asynchronous programs. Something like:
...do something
...send some async request
while (not done)
...do something else
...do async check for results
The nature of asynchronous calls is such that, if you want the application to continue running while the call is in progress, you will either need to spawn a new thread, or at least utilise another thread you that you have created solely for the purposes of handling asynchronous callbacks.
Sometimes, depending on the situation, you may want to invoke an asynchronous method but make it appear to the user to be be synchronous (i.e. block until the asynchronous method has signalled that it is complete). This can be achieved through Win32 APIs such as WaitForSingleObject.

Resources