Explain how non-blocking IO in Play works - multithreading

I found this article explain about non-blocking IO Play framework: https://engineering.linkedin.com/play/play-framework-async-io-without-thread-pool-and-callback-hell
Code example in that article
object ProxyController extends Controller {
def proxy = Action {
val responseFuture: Future[Response] = WS.url("http://example.com").get()
Logger.info("Before map")
val resultFuture: Future[Result] = responseFuture.map { resp =>
Logger.info("Within map")
// Create a Result that uses the http status, body, and content-type
// from the example.com Response
Status(resp.status)(resp.body).as(resp.ahcResponse.getContentType)
}
Logger.info("After map")
Async(resultFuture)
}
The explaination from that article:
Under the hood, Play uses a thread pool sized to one thread per CPU
core. One of these scarce threads, T1, executes the proxy action,
running through the code from top to bottom, except the contents of
the function passed to the map method, since that depends on a
non-blocking I/O call that has not yet completed. Once T1 returns the
AsyncResult, it moves on to process other requests. Later on, when the
response from example.com is finally available, another thread T2
(which may or may not be the same as T1) executes the function passed
to the map method. At no point were either of the threads blocked
waiting on the response from example.com.
I don't understand the highlighted in that paragraph. T1 has returned to thread pool and how the application keep track and receive the response from example.com then submit thread T2 to execute map function after that.
Please someone explain me this.

One of these scarce threads, T1, executes the proxy action, running through the code from top to bottom, except the contents of the function passed to the map method, since that depends on a non-blocking I/O call that has not yet completed.
At this point an object called a future has been created.
Once T1 returns the AsyncResult, it moves on to process other requests.
The future sits in memory waiting to be fulfilled when some IO comes in. Meanwhile the thread T1 can process other requests.
Later on, when the response from example.com is finally available,
When the response is available this will call some internal Play code which reads the response and changes the state of the future to give it a value. The moment the future gets a value it schedules its map code to run. This will run in some thread on thread pool, e.g. T2.
another thread T2 (which may or may not be the same as T1) executes the function passed to the map method.
At no point were either of the threads blocked waiting on the response from example.com.
Instead of waiting for IO by blocking a thread, Play secretly registered a callback so that when the IO is available it will fulfill a future. When the future is fulfilled it automatically calls the next part of the application code. This means the application code can wait (without using a thread) until the IO is available.

Related

thread with a forever loop with one inherently asynch operation

I'm trying to understand the semantics of async/await in an infinitely looping worker thread started inside a windows service. I'm a newbie at this so give me some leeway here, I'm trying to understand the concept.
The worker thread will loop forever (until the service is stopped) and it processes an external queue resource (in this case a SQL Server Service Broker queue).
The worker thread uses config data which could be changed while the service is running by receiving commands on the main service thread via some kind of IPC. Ideally the worker thread should process those config changes while waiting for the external queue messages to be received. Reading from service broker is inherently asynchronous, you literally issue a "waitfor receive" TSQL statement with a receive timeout.
But I don't quite understand the flow of control I'd need to use to do that.
Let's say I used a concurrentQueue to pass config change messages from the main thread to the worker thread. Then, if I did something like...
void ProcessBrokerMessages() {
foreach (BrokerMessage m in ReadBrokerQueue()) {
ProcessMessage(m);
}
}
// ... inside the worker thread:
while (!serviceStopped) {
foreach (configChange in configChangeConcurrentQueue) {
processConfigChange(configChange);
}
ProcessBrokerMessages();
}
...then the foreach loop to process config changes and the broker processing function need to "take turns" to run. Specifically, the config-change-processing loop won't run while the potentially-long-running broker receive command is running.
My understanding is that simply turning the ProcessBrokerMessages() into an async method doesn't help me in this case (or I don't understand what will happen). To me, with my lack of understanding, the most intuitive interpretation seems to be that when I hit the async call it would go off and do its thing, and execution would continue with a restart of the outer while loop... but that would mean the loop would also execute the ProcessBrokerMessages() function over and over even though it's already running from the invocation in the previous loop, which I don't want.
As far as I know this is not what would happen, though I only "know" that because I've read something along those lines. I don't really understand it.
Arguably the existing flow of control (ie, without the async call) is OK... if config changes affect ProcessBrokerMessages() function (which they can) then the config can't be changed while the function is running anyway. But that seems like it's a point specific to this particular example. I can imagine a case where config changes are changing something else that the thread does, unrelated to the ProcessBrokerMessages() call.
Can someone improve my understanding here? What's the right way to have
a block of code which loops over multiple statements
where one (or some) but not all of those statements are asynchronous
and the async operation should only ever be executing once at a time
but execution should keep looping through the rest of the statements while the single instance of the async operation runs
and the async method should be called again in the loop if the previous invocation has completed
It seems like I could use a BackgroundWorker to run the receive statement, which flips a flag when its job is done, but it also seems weird to me to create a thread specifically for processing the external resource and then, within that thread, create a BackgroundWorker to actually do that job.
You could use a CancelationToken. Most async functions accept one as a parameter, and they cancel the call (the returned Task actually) if the token is signaled. SqlCommand.ExecuteReaderAsync (which you're likely using to issue the WAITFOR RECEIVE is no different. So:
Have a cancellation token passed to the 'execution' thread.
The settings monitor (the one responding to IPC) also has a reference to the token
When a config change occurs, the monitoring makes the config change and then signals the token
the execution thread aborts any pending WAITFOR (or any pending processing in the message processing loop actually, you should use the cancellation token everywhere). any transaction is aborted and rolled back
restart the execution thread, with new cancellation token. It will use the new config
So in this particular case I decided to go with a simpler shared state solution. This is of course a less sound solution in principle, but since there's not a lot of shared state involved, and since the overall application isn't very complicated, it seemed forgivable.
My implementation here is to use locking, but have writes to the config from the service main thread wrapped up in a Task.Run(). The reader doesn't bother with a Task since the reader is already in its own thread.

Serial Dispatch Queue with Asynchronous Blocks

Is there ever any reason to add blocks to a serial dispatch queue asynchronously as opposed to synchronously?
As I understand it a serial dispatch queue only starts executing the next task in the queue once the preceding task has completed executing. If this is the case, I can't see what you would you gain by submitting some blocks asynchronously - the act of submission may not block the thread (since it returns straight-away), but the task won't be executed until the last task finishes, so it seems to me that you don't really gain anything.
This question has been prompted by the following code - taken from a book chapter on design patterns. To prevent the underlying data array from being modified simultaneously by two separate threads, all modification tasks are added to a serial dispatch queue. But note that returnToPool adds tasks to this queue asynchronously, whereas getFromPool adds its tasks synchronously.
class Pool<T> {
private var data = [T]();
// Create a serial dispath queue
private let arrayQ = dispatch_queue_create("arrayQ", DISPATCH_QUEUE_SERIAL);
private let semaphore:dispatch_semaphore_t;
init(items:[T]) {
data.reserveCapacity(data.count);
for item in items {
data.append(item);
}
semaphore = dispatch_semaphore_create(items.count);
}
func getFromPool() -> T? {
var result:T?;
if (dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER) == 0) {
dispatch_sync(arrayQ, {() in
result = self.data.removeAtIndex(0);
})
}
return result;
}
func returnToPool(item:T) {
dispatch_async(arrayQ, {() in
self.data.append(item);
dispatch_semaphore_signal(self.semaphore);
});
}
}
Because there's no need to make the caller of returnToPool() block. It could perhaps continue on doing other useful work.
The thread which called returnToPool() is presumably not just working with this pool. It presumably has other stuff it could be doing. That stuff could be done simultaneously with the work in the asynchronously-submitted task.
Typical modern computers have multiple CPU cores, so a design like this improves the chances that CPU cores are utilized efficiently and useful work is completed sooner. The question isn't whether tasks submitted to the serial queue operate simultaneously — they can't because of the nature of serial queues — it's whether other work can be done simultaneously.
Yes, there are reasons why you'd add tasks to serial queue asynchronously. It's actually extremely common.
The most common example would be when you're doing something in the background and want to update the UI. You'll often dispatch that UI update asynchronously back to the main queue (which is a serial queue). That way the background thread doesn't have to wait for the main thread to perform its UI update, but rather it can carry on processing in the background.
Another common example is as you've demonstrated, when using a GCD queue to synchronize interaction with some object. If you're dealing with immutable objects, you can dispatch these updates asynchronously to this synchronization queue (i.e. why have the current thread wait, but rather instead let it carry on). You'll do reads synchronously (because you're obviously going to wait until you get the synchronized value back), but writes can be done asynchronously.
(You actually see this latter example frequently implemented with the "reader-writer" pattern and a custom concurrent queue, where reads are performed synchronously on concurrent queue with dispatch_sync, but writes are performed asynchronously with barrier with dispatch_barrier_async. But the idea is equally applicable to serial queues, too.)
The choice of synchronous v asynchronous dispatch has nothing to do with whether the destination queue is serial or concurrent. It's simply a question of whether you have to block the current queue until that other one finishes its task or not.
Regarding your code sample code, that is correct. The getFromPool should dispatch synchronously (because you have to wait for the synchronization queue to actually return the value), but returnToPool can safely dispatch asynchronously. Obviously, I'm wary of seeing code waiting for semaphores if that might be called from the main thread (so make sure you don't call getFromPool from the main thread!), but with that one caveat, this code should achieve the desired purpose, offering reasonably efficient synchronization of this pool object, but with a getFromPool that will block if the pool is empty until something is added to the pool.

Asynchronous IO: How does the thread get contacted after IO is complete?

So on an asynchronous framework such as Node or Netty, a worker thread can be given an IO job, which it initiates along with a callback. Then it returns and picks up a different task while that IO job, be it disk read, DB query, etc, runs.
My question is, after the IO is done, how is that event/callback picked up for further processing? I'm assuming in a synchronous operation, there is a thread right there waiting. But in an asynchronous environment, what picks up the completion of the IO, along with the response data? Does the worker thread periodically check for completion? Or does something register the completion event somehow with Node or Netty?
Sorry for lumping Netty and Node together, I'm assuming they do this similarly.
In Netty, any IO operation is asynchronous,and the read or write method will return a Future Object which can add Listener, after future is done , the listener will call back. In listener, you can do anything you would like.

What happens with Win32 IO Completion Port and synchronous appearing IO?

According to http://support.microsoft.com/kb/156932 calls to ReadFile can appear synchronous if certain conditions are met. For example if the target file is NTFS compressed. The article does not say anything about what happens if the file handle is accociated to an IOCP.
So what happens in this case when the file handle is associated with an IOCP? Will i still receive IO completion packets for this request or will the request carried out completely synchronous?
If so, i have to put the whole ReadFile call in a worker thread. The thread that issues the ReadFile call initially is not allowed to block. The reason i am considering IOCP is because putting the ReadFile call into a worker thread means a context switch to the worker thread which then blocks immediatly after on ReadFile.
Any overlapped operation that completes with ERROR_SUCCESS OR with ERROR_IO_PENDING will generate a completion packet. See tip 4 of this knowledge base article.
This assumes that you haven't enabled FILE_SKIP_COMPLETION_PORT_ON_SUCCESS on the handle in question, using SetFileCompletionNotificationModes(). If you HAVE enabled FILE_SKIP_COMPLETION_PORT_ON_SUCCESS then operations that complete with ERROR_SUCCESS will NOT generate a completion packet and you should do completion processing at the point where you issued the overlapped operation.

Does an asynchronous call always create/call a new thread?

Does asynchronous call always create a new thread?
Example:
If JavaScript is single threaded then how can it do an async postback? Is it actually blocking until it gets a callback? If so, is this really an async call?
This is an interesting question.
Asynchronous programming is a paradigm of programming that is principally single threaded, i.e. "following one thread of continuous execution".
You refer to javascript, so lets discuss that language, in the environment of a web browser. A web browser runs a single thread of javascript execution in each window, it handles events (such as onclick="someFunction()") and network connections (such as xmlhttprequest calls).
<script>
function performRequest() {
xmlhttp.open("GET", "someurl", true);
xmlhttp.onreadystatechange = function() {
if (xmlhttp.readyState == 4) {
alert(xmlhttp.responseText);
}
}
xmlhttp.send(sometext);
}
</script>
<span onclick="performRequest()">perform request</span>
(This is a nonworking example, for demonstration of concepts only).
In order to do everything in an asynchronous manner, the controlling thread has what is known as a 'main loop'. A main loop looks kind of like this:
while (true) {
event = nextEvent(all_event_sources);
handler = findEventHandler(event);
handler(event);
}
It is important to note that this is not a 'busy loop'. This is kind of like a sleeping thread, waiting for activity to occur. Activity could be input from the user (Mouse Movement, a Button Click, Typing), or it could be network activity (The response from the server).
So in the example above,
When the user clicks on the span, a ButtonClicked event would be generated, findEventHandler() would find the onclick event on the span tag, and then that handler would be called with the event.
When the xmlhttp request is created, it is added to the all_event_sources list of event sources.
After the performRequest() function returns, the mainloop is waiting at the nextEvent() step waiting for a response. At this point there is nothing 'blocking' further events from being handled.
The data comes back from the remote server, nextEvent() returns the network event, the event handler is found to be the onreadystatechange() method, that method is called, and an alert() dialog fires up.
It is worth noting that alert() is a blocking dialog. While that dialog is up, no further events can be processed. It's an eccentricity of the javascript model of web pages that we have a readily available method that will block further execution within the context of that page.
The Javascript model is single-threaded. An asynchronous call is not a new thread, but rather interrupts an existing thread. It's analogous to interrupts in a kernel.
Yes it makes sense to have asynchronous calls with a single thread. Here's how to think about it: When you call a function within a single thread, the state for the current method is pushed onto a stack (i.e. local variables). The subroutine is invoked and eventually returns, at which time the original state is popped off the stack.
With an asynchronous callback, the same thing happens! The difference is that the subroutine is invoked by the system, not by the current code invoking a subroutine.
A couple notes about JavaScript in particular:
XMLHttpRequests are non-blocking by default. The send() method returns immediately after the request has been relayed to the underlying network stack. A response from the server will schedule an invocation of your callback on the event loop as discussed by the other excellent answers.
This does not require a new thread. The underlying socket API is selectable, similar to java.nio.channels in Java.
It's possible to construct synchronous XMLHttpRequest objects by passing false as the third parameter to open(). This will cause the send() method to block until a response has been received from the server, thus placing the event loop at the mercy of network latency and potentially hanging the browser until network timeout. This is a Bad Thing™.
Firefox 3.5 will introduce honest-to-god multithreaded JavaScript with the Worker class. The background code runs in a completely separate environment and communicates with the browser window by scheduling callbacks on the event loop.
In many GUI applications, an async call (like Java's invokeLater) merely adds the Runnable object to its GUI thread queue. The GUI thread is already created, and it doesn't create a new thread. But threads aren't even strictly required for an asynchronous system. Take, for example, libevent, which uses select/poll/kqueue, etc. to make non-blocking calls to sockets, which then fires callbacks to your code, completely without threads.
No, but more than one thread will be involved.
An asynchronous call might launch another thread to do the work, or it might post a message into a queue on another, already running thread. The caller continues and the callee calls back once it processes the message.
If you wanted to do a synchronous call in this context, you'd need to post a message and actively wait for the callback to happen.
So in summary: More than one thread will be involved, but it doesn't necessarily create a new thread.
I don't know about javascript, but for instance in the Windows Forms world, asynchronous invocations can be made without multiple threads. This has to do with the way the Windows Message Pump operates. Basically a Windows Forms application sets up a message queue through which Windows places messages notifying it about events. For instance, if you move the mouse, messages will be placed on that queue. The Windows Forms application will be in an endless loop consuming all the messages that are thrown at it. According to what each message contains it will move windows around, repaint them or even invoke user-defined methods, amongst other things. Calls to methods are identified by delegates. When the application finds a delegate instance in the queue, it happily invokes the method referred by the delegate.
So, if you are in a method doing something and want to spawn some asynchronous work without creating a new thread, all you have to do is place a delegate instance into the queue, using the Control.BeginInvoke method. Now, this isn't actually multithreaded, but if you throw very small pieces of work to the queue, it will look like multithreaded. If, on the other hand you give it a time consuming method to execute, the application will freeze until the method is done, which will look like a jammed application, even though it is doing something.

Resources