Play-Scala Async for Dummies - multithreading

I tried researching and understanding the async and non-blocking ability of play.
What I understand(may be wrong):
Action.async leads to Future[Result] which are placeholders of results that are yet to be received. A request comes in and a method handles it (say a database query) but as the db call is made, the thread is freed up to take another request. So how does the system not lose track of that db call as a thread is no longer with it?
Once the result is received does an available thread then pick up on that result and responds with it?
Usually when learning new concepts I'd like an animated or layman's terms type video that visually shows threads and requests.
Also isn't there waiting that has to be done on every request anyway, is it just that resources are used when that waiting is being done?
Thanks in advance!

Related

How many thread really have in Node

As Node is using java-script engine internally it means Node has only one thread but i was checking online,some people saying two thread some saying one thread per core and some are saying something else,Can anyone clarify?
And if it is single threaded then how many request it can handle concurrently because if 1000 request are coming at the same time and say each request taking 100ms then the one at the 1001th will take 100 sec (100×1000=100,000 ms=100 sec).
So it means Node is not good for large number of user?
Based on your writing, I can tell you know something about programming, some logic and, some basic math, but you are getting into javascript (not java-script) now.
That background makes you an excellent fit for this new javascript on the server-side with nodejs, and I foresee you becoming great at it.
What is clear to me is that you are confusing parallelism with concurrency and this post here is going to be very useful for you.
But a TLDR could be something like this:
Parallelism: 2 or more Process/Threads running at the same time
Concurrency: The main single Process/Thread will not be waiting for an IO operation to end, it will keep doing other things and get back to it whenever the IO operation ends
IO (Input/Output) operations involve interactions with the OS (Operating System) by using the DISK or the NETWORK, for example.
In nodejs, those tasks are asynchronous, and that's why they ask for a callback, or they are Promise based.
const fs = require('fs')
function myCallbackFunc (err, data) {
if (err) {
return console.error(err)
}
console.log(data)
}
fs.readFile('./some-large-file', myCallbackFunc)
fs.readFile('./some-tiny-file', myCallbackFunc)
A simple way you could theoretically say that in the example above you'll have a
main thread taking care of your code and another one (which you don't control at all) observing the asynchronous requests, and the second one will call the myCallBackFunc whenever the IO process, which will happen concurrently, ends.
So YES, nodejs is GREAT for a large number of requests.
Of course, that single process will still share the same computational power with all the concurrent requests it is taking care.
Meaning that if within the callback above you are doing some heavy computational tasks that take a while to execute, the second call would have to wait for the first one to end.
In that case, if you are running it on your own Server/Container, you can make use of real parallelism by forking the process using the Cluster native module that already comes with nodejs :D
Node has 1 main thread(that's why called single threaded) which receives all request and it then give that request to internal threads(these are not main thread these are smaller threads) which are present in thread pool. Node js put all its request in event queue. And node server continuously runs internal event loop which checks any request is placed in Event Queue. If no, then wait for incoming requests for indefinitely else it picks one request from event queue.
And checks Threads availability from Internal Thread Pool if it's available then it picks up one Thread and assign this request to that thread.
That Thread is responsible for taking that request, process it, perform Blocking IO operations, prepare response and send it back to the Event Loop
Event Loop in turn, sends that Response to the respective Client.
Click here for Node js architecture image

How is Node.js asynchronous AND single-threaded at the same time?

The question title basically says it all, but to rephrase it:What handles the asynchronous function execution, if the main (and only) thread is occupied with running down the main code block?
So far I have only found that the async code gets executed elsewhere or outside the main thread, but what does this mean specifically?
EDIT: The proposed Node.js Event loop question's answers may also address this topic, but I was looking for a less complex, more specific answer, rather then an explanation of Node.js concept. Also, it does not show up in search for anything similar to "node asynchronous single-threaded".
EDIT, #Mr_Thorynque: Running a query to get data from database and log it to console. Nothing gets logged, because Node, being async, does not wait for query to finish and for data to populate. (this is just an example as requested, NOT a part of my question)
var = data;
mysql.query(`SELECT *some rows from database*`, function (err, rows, fields) {
rows.forEach(function(row){
data += *gather the requested data*
});
});
console.log(data);
What it really comes down to is that the node process (which is single threaded) hands off the work to "something else". This could be the OS's I/O process, or a networked resource or whatever. By handing it off, it frees its thread to keep working on the next in-memory thing. It uses file handles to keep track of any pending work and in the event loop marries the two back together and fire the callback when the work is done.
Note that this is also why you can block processing in your own code if poorly designed. If your code runs complex tasks and doesn't hand off the work, you'll block the single thread.
This is about as simple an answer as I can make, I think that the details are well explained in the links in your comments.

How to properly construct a Twitter Future

We're using Finatra and have services return a Twitter Future.
Currently we use either Future { ... } or Future.value(..) to construct Future instances, but looking at the source this does not seem correct.
In Future.apply source doc it says: "that a is executed in the calling thread and as such some care must be taken with blocking code."
So, how to create a Future which executes the function on a separate thread, just like the Scala Future does?
You need a FuturePool for that. Something like val future = FuturePool.defaultPool { doStuff () }
Both Future.value and Future.apply are immediate. They are more or less equivalent to scala.concurrent.Future.successful.
+1 to Dima's answer, but...
Doing things in a background thread (FuturePool) because your server is struggling to keep up with request load isn't usually the correct solution. Assuming you are just processing a CPU intensive task for 100ms, its probably better to keep it on the same thread and adjust the number of servers you have and the number of threads servicing requests.
But if you are doing something like querying a database or remote service, that call would ideally return a truly asynchronous Future that isn't blocking any finagle threads.
If you have a sync API wrapping a network service, then FuturePool is probably the correct thing to workaround it.

General Strategies for Profiling Simultaneous Asynchronous Requests

We have a system that makes 1 to N asynchronous requests ("foo") within the same time frame. These are launched on threads other than the main and all of these requests don't necessarily originate from the same thread.
Callbacks for the asynchronous requests are all handled on one specific thread, which for the sake of discussion, we'll call the 'bar' thread.
Everything done 'request side' is opaque to us. We don't have access to that library.
Up to this point in time, we've gotten away with a very naive profiler which basically calls markStart('measurement name') and markDone('measurement name') to time a request. I'm getting closer to having to profile the individual foo requests, from the time we start the foo request, to when it is handled by bar.
Obviously our existing profiler won't work, and I'll need to introduce a way to associate the correct markDone() call in callback with its corresponding markStart() from a foo.
If our requests had some manner of sequence number returned in response it would be straight forward, however we don't have those.
Is there a smart, generic way that I can associate an ID with each of the requests, that is visible across threads, or is profiling in this situation usually handled differently (if at all)?
I don't know of any profiler that will be useful for this.
That doesn't mean they don't exist.
I have faced this kind of problem before.
I wrote a book, and discussed this in it.
Basically I came up with two methods, one that works within-thread, and the other across threads.
You really need both, because either one can spend time unnecessarily.
So here are some scanned pages:

Single threaded and Event Loop in Node.js

First of all, I am starter trying to understand what is Node.Js. I have two questions.
First Question
From the article of Felix, it said "there can only be one callback firing at the same time. Until that callback has finished executing, all other callbacks have to wait in line".
Then, consider about the following code (copied from nodejs official website)
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(8124, "127.0.0.1");
If two client requests are received simultaneously, it means the following workflow:
First http request event received, Second request event received.
As soon as first event received, callback function for first event is executing.
At while, callback function for second event has to be waiting.
Am I right? If I am right, how Node.js control if there are thousands of client request within very short-time duration.
Second Question
The term "Event Loop" is mostly used in Node.js topic. I have understood "Event Loop" as the following from http://www.wisegeek.com/what-is-an-event-loop.htm;
An event loop - or main loop, is a construct within programs that
controls and dispatches events following an initial event.
The initial event can be anything, including pushing a button on a
keyboard or clicking a button on a program (in Node.js, I think the
initial events will be http request, db queries or I/O file access).
This is called a loop, not because the event circles and happens
continuously, but because the loop prepares for an event, checks the
event, dispatches an event and repeats the process all over again.
I have a conflict about second paragraph especially the phrase"repeats the process all over again". I accepted that the above http.createServer code from above question is absolutely "event loop" because it repeatedly listens the http request events.
But I don't know how to identify the following code as whether event-driven or event loop. It does not repeat anything except the callback function fired after db query is finished.
database.query("SELECT * FROM table", function(rows) {
var result = rows;
});
Please, let me hear your opinions and answers.
Answer one, your logic is correct: second event will wait. And will execute as far as its queued callbacks time comes.
As well, remember that there is no such thing as "simultaneously" in technical world. Everything have very specific place and time.
The way node.js manages thousands of connections is that there is no need to hold thread idling while there is some database call blocking the logic, or another IO operation is processing (like streams for example). It can "serve" first request, maybe creating more callbacks, and proceed to others.
Because there is no way to block the execution (except nonsense while(true) and similar), it becomes extremely efficient in spreading actual resources all over application logic.
Threads - are expensive, and server capacity of threads is directly related to available Memory. So most of classic web applications would suffer just because RAM is used on threads that are simply idling while there is database query block going on or similar. In node that's not a case.
Still, it allows to create multiple threads (as child_process) through cluster, that expands even more possibilities.
Answer Two. There is no such thing as "loop" that you might thinking about. There will be no loop behind the scenes that does checks if there is connections or any data received and so on. It is nowadays handled by Async methods as well.
So from application point of view, there is no 'main loop', and everything from developer point of view is event-driven (not event-loop).
In case with http.createServer, you bind callback as response to requests. All socket operations and IO stuff will happen behind the scenes, as well as HTTP handshaking, parsing headers, queries, parameters, and so on. Once it happens behind the scenes and job is done, it will keep data and will push callback to event loop with some data. Once event loop ill be free and will come time it will execute in node.js application context your callback with data from behind the scenes.
With database request - same story. It ill prepare and ask stuff (might do it even async again), and then will callback once database responds and data will be prepared for application context.
To be honest, all you need with node.js is to understand the concept, but not implementation of events.
And the best way to do it - experiment.
1) Yes, you are right.
It works because everything you do with node is primarily I/O bound.
When a new request (event) comes in, it's put into a queue. At initialization time, Node allocates a ThreadPool which is responsible to spawn threads for I/O bound processing, like network/socket calls, database, etc. (this is non-blocking).
Now, your "callbacks" (or event handlers) are extremely fast because most of what you are doing is most likely CRUD and I/O operations, not CPU intensive.
Therefore, these callbacks give the feeling that they are being processed in parallel, but they are actually not, because the actual parallel work is being done via the ThreadPool (with multi-threading), while the callbacks per-se are just receiving the result from these threads so that processing can continue and send a response back to the client.
You can easily verify this: if your callbacks are heavy CPU tasks, you can be sure that you will not be able to process thousands of requests per second and it scales down really bad, comparing to a multi-threaded system.
2) You are right, again.
Unfortunately, due to all these abstractions, you have to dive in order to understand what's going on in background. However, yes, there is a loop.
In particular, Nodejs is implemented with libuv.
Interesting to read.
But I don't know how to identify the following code as whether event-driven or event loop. It does not repeat anything except the callback function fired after db query is finished.
Event-driven is a term you normally use when there is an event-loop, and it means an app that is driven by events such as click-on-button, data-arrived, etc. Normally you associate a callback to such events.

Resources