How to properly construct a Twitter Future - multithreading

We're using Finatra and have services return a Twitter Future.
Currently we use either Future { ... } or Future.value(..) to construct Future instances, but looking at the source this does not seem correct.
In Future.apply source doc it says: "that a is executed in the calling thread and as such some care must be taken with blocking code."
So, how to create a Future which executes the function on a separate thread, just like the Scala Future does?

You need a FuturePool for that. Something like val future = FuturePool.defaultPool { doStuff () }
Both Future.value and Future.apply are immediate. They are more or less equivalent to scala.concurrent.Future.successful.

+1 to Dima's answer, but...
Doing things in a background thread (FuturePool) because your server is struggling to keep up with request load isn't usually the correct solution. Assuming you are just processing a CPU intensive task for 100ms, its probably better to keep it on the same thread and adjust the number of servers you have and the number of threads servicing requests.
But if you are doing something like querying a database or remote service, that call would ideally return a truly asynchronous Future that isn't blocking any finagle threads.
If you have a sync API wrapping a network service, then FuturePool is probably the correct thing to workaround it.

Related

Play-Scala Async for Dummies

I tried researching and understanding the async and non-blocking ability of play.
What I understand(may be wrong):
Action.async leads to Future[Result] which are placeholders of results that are yet to be received. A request comes in and a method handles it (say a database query) but as the db call is made, the thread is freed up to take another request. So how does the system not lose track of that db call as a thread is no longer with it?
Once the result is received does an available thread then pick up on that result and responds with it?
Usually when learning new concepts I'd like an animated or layman's terms type video that visually shows threads and requests.
Also isn't there waiting that has to be done on every request anyway, is it just that resources are used when that waiting is being done?
Thanks in advance!

General Strategies for Profiling Simultaneous Asynchronous Requests

We have a system that makes 1 to N asynchronous requests ("foo") within the same time frame. These are launched on threads other than the main and all of these requests don't necessarily originate from the same thread.
Callbacks for the asynchronous requests are all handled on one specific thread, which for the sake of discussion, we'll call the 'bar' thread.
Everything done 'request side' is opaque to us. We don't have access to that library.
Up to this point in time, we've gotten away with a very naive profiler which basically calls markStart('measurement name') and markDone('measurement name') to time a request. I'm getting closer to having to profile the individual foo requests, from the time we start the foo request, to when it is handled by bar.
Obviously our existing profiler won't work, and I'll need to introduce a way to associate the correct markDone() call in callback with its corresponding markStart() from a foo.
If our requests had some manner of sequence number returned in response it would be straight forward, however we don't have those.
Is there a smart, generic way that I can associate an ID with each of the requests, that is visible across threads, or is profiling in this situation usually handled differently (if at all)?
I don't know of any profiler that will be useful for this.
That doesn't mean they don't exist.
I have faced this kind of problem before.
I wrote a book, and discussed this in it.
Basically I came up with two methods, one that works within-thread, and the other across threads.
You really need both, because either one can spend time unnecessarily.
So here are some scanned pages:

Understanding the Event-Loop in node.js

I've been reading a lot about the Event Loop, and I understand the abstraction provided whereby I can make an I/O request (let's use fs.readFile(foo.txt)) and just pass in a callback that will be executed once a particular event indicates completion of the file reading is fired. However, what I do not understand is where the function that is doing the work of actually reading the file is being executed. Javascript is single-threaded, but there are two things happening at once: the execution of my node.js file and of some program/function actually reading data from the hard drive. Where does this second function take place in relation to node?
The Node event loop is truly single threaded. When we start up a program with Node, a single instance of the event loop is created and placed into one thread.
However for some standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely. So they will not block the main loop or event loop. Instead they make use of something called a thread pool that thread pool is a series of (by default) four threads that can be used for running computationally intensive tasks. There are ONLY FOUR things that use this thread pool - DNS lookup, fs, crypto and zlib. Everything else execute in the main thread.
"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js." via http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
Its like using, setTimeout(function(){/*file reading code here*/},1000);. JavaScript can run multiple things side by side like, having three setInterval(function(){/*code to execute*/},1000);. So in a way, JavaScript is multi-threading. And for actually reading from/or writing to the hard drive, in NodeJS, if you use:
var child=require("child_process");
function put_text(file,text){
child.exec("echo "+text+">"+file);
}
function get_text(file){
//JQuery code for getting file contents here (i think)
return JQueryResults;
}
These can also be used for reading and writing to/from the hard drive using NodeJS.

Node Background Threads - When Do These Get Created?

I've been doing a fair amount of work with Node lately, trying to build a system which has certain characteristics, one of which is non-blocking / parallelism - a Node strong suit, as I understand it.
What I don't fully understand is when a separate thread is spun off to handle some processing. I'm pretty sue this happens on a function call/call back, but certainly not all of them.
In my specific case, it's an Express based app. At app start-up it does several things including instantiating a RabbitMQ based "bus", an object with a method which will write to the bus (objA) and object which will subscribe to the bus and process messages coming across it (objB).
objA will write to the bus inside an express callback
app.put((req,res) => {
objA.methodWhichWritesToBus();
});
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
Is that the only point at which this sort of thing happens? methodWhichWritesToBus is IO instensive (it calls an elastic search service on another box and brings back 10's to 100's of thousands of records) with lots of chained promises etc., but none of that gets split off, does it?
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Finally, are the ways to effect/force a method etc to "run in the background"?
I've been noodling this, testing it, for awhile now but all on one machine so it's difficult to tell what's going on.
Who can clarify this for me?
Pre-answer: this is a topic best learned by going and reading, doing coding exercises to solidify your understanding, and working with the technology in a significant way. You're not going to "get it" based on a Q&A format. That said...
What I don't fully understand is when a separate thread is spun off to handle some processing.
Never, sort of. "Processing" as in the computation that happens in your javascript program, happens in the main event loop thread. End of story. However, waiting on I/O to come back from the OS is not considered "processing" so there are various queues managed by node and the OS to track pending I/O requests and invoke callbacks when data is ready. There are a handful of threads node uses internally to manage this stuff with the OS, but from your program's perspective, those threads are irrelevant. Your program can ask node to do some IO, then your program keeps running in parallel, and when the I/O is done, node will eventually invoke the callback in the main event loop and you can process the results.
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
You call it "asynchronously" and it happens whenever you do IO, including filesystem calls, networking, or child processes. Which is to say, quite a lot.
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Nope.
Finally, are the ways to effect/force a method etc to "run in the background"?
Generally I/O is done asynchronously by default, so no you don't normally need to force anything to run in the background. It's baked into the node design by way of the node core APIs themselves. However, there are ways to delay synchronous processing to a future event loop using setImmediate, setTimeout, or process.nextTick. I explain these in some detail in my blog post setTimeout and friends.
More precisely, all networking is asynchronous. End of story. Specifically, the APIs in node core that are available are all asynchronous, and there's simply no synchronous API available in node. For filesystem IO and child processes, there are both synchronous and asynchronous APIs, but the synchronous APIs must only be used under special limited circumstances, and if you don't know confidently that it's OK in this specific case to make a synchronous IO API call, you should use the asynchronous API so you don't break the lynchpin that makes node perform as it does.

Does Go have callback concept?

I found many talks saying that Node.js is bad because of callback hell and Go is good because of its synchronous model.
What I feel is Go can also do callback as same as Node.js but in a synchronous way. As we can pass anonymous function and do closure things
So, why are they comparing Go and Node.js in callback perspective as if Go cannot become callback hell.
Or I misunderstand the meaning of callback and anonymous function in Go?
A lot of things take time, e.g. waiting on a network socket, a file system read, a system call, etc. Therefore, a lot of languages, or more precisely their standard library, include asynchronous version of their functions (often in addition to the synchronous version), so that your program is able to do something else in the mean-time.
In node.js things are even more extreme. They use a single-threaded event loop and therefore need to ensure that your program never blocks. They have a very well written standard library that is built around the concept of being asynchronous and they use callbacks in order to notify you when something is ready. The code basically looks like this:
doSomething1(arg1, arg2, function() {
doSomething2(arg1, arg2, function() {
doSomething3(function() {
// done
});
});
});
somethingElse();
doSomething1 might take a long time to execute (because it needs to read from the network for example), but your program can still execute somethingElse in the mean time. After doSomething1 has been executed, you want to call doSomething2 and doSomething3.
Go on the other hand is based around the concept of goroutines and channels (google for "Communicating Sequential Processes", if you want to learn more about the abstract concept). Goroutines are very cheap (you can have several thousands of them running at the same time) and therefore you can use them everywhere. The same code might look like this in Go:
go func() {
doSomething1(arg1, arg2)
doSomething2(arg1, arg2)
doSomething3()
// done
}()
somethingElse()
Whereas node.js focus on providing only asynchronous APIs, Go usually encourages you to write only synchronous APIs (without callbacks or channels). The call to doSomething1 will block the current goroutine and doSomething2 will only be executed after doSomething1 has finished. But that's not a problem in Go, since there are usually other goroutines available that can be scheduled to run on the system thread. In this case, somethingElse is part of another goroutine and can be executed in the meantime, just like in the node.js example.
I personally prefer the Go code, since it's easier to read and reason about. Another advantage of Go is that it also works well with computation heavy tasks. If you start a heavy computation in node.js that doesn't need to wait for network of filesystem calls, this computation basically blocks your event loop. Go's scheduler on the other hand will do its best to dispatch the goroutines on a few number of system threads and the OS might run those threads in parallel if your CPU supports it.
What I feel is Golang can also do callback as same as Node.js but in a synchronous way. As we can pass anonymous function and do closure things
So, why are they comparing Golang and Node.js in callback perspective as if Golang cannot become callback hell.
Yes, of course it is possible to mess things up in Go as well. The reason why you don't see as much callbacks as in node.js is that Go has channels for communication, which allow for a way of structuring your code without using callbacks.
So, since there are channels, callbacks are not used as often therefore it is unlikely to stumble over callback infested code. Of course this doesn't mean that you cannot write scary code with channels as well...

Resources