How to wrap Web Worker response messages in futures? - multithreading

Please consider a scala.js application which runs in the browser and consists of a main program and a web worker.
The main thread delegates long running operations to the web worker by passing messages that contain the names of methods and the parameters required to invoke them. The worker passes method return values back to the main thread in the form of response messages.
In simpler terms, this program abstracts web worker messaging so that code in the main thread can call methods in the worker thread in idiomatic and asynchronous Scala syntax.
Because web workers do not associate messages with their responses in any way, the abstraction relies on a registry, an intermediary object, that governs each cross context method call to associate the invocation with the result. This singleton could also bind callback functions but is there a way to accomplish this with futures instead of callbacks?
How can I build an abstraction over this registry that allows programmers to use it with the standard asynchronous programming structures in Scala: futures and promises?
How should I write this functionality so that scala programmers can interact with it in the canonical way? For example:
// long running method in the web worker
val f: Future[String] = Registry.ultimateQuestion(42) // async
f onSuccess { case q => println("The ultimate question is: " + q) }
I'm new to futures and promises, but it seems like they usually complete when some execution block terminates. In this case, receiving a response from the web worker signifies completion of the future. Is there a way to write a custom future that delegates its completion status to an external process? Is there another way to link the web worker response message to the status of the future?
Can/Should I extend the Future trait? Is this possible in Scala.js? Is there a concrete class that I should extend? Is there some other way to encapsulate these cross context web worker method calls in existing asynchronous Scala functionality?
Thank you for your consideration.

Hmm. Just spitballing here (I haven't used workers yet), but it seems like associating the request with the Future is fairly easy in the single-threaded JavaScript world you're working in.
Here's a hypothetical design. Say that each request/response to the worker is automatically wrapped in an Envelope; the Envelope contains a RequestId. So the send side looks something like (this is pseudo-code, but real-ish):
def sendRequest[R](msg:Message):Future[R] = {
val promise = Promise[R]
val id = nextRequestId()
val envelope = Envelope(id, msg)
register(id, promise)
sendToWorker(envelope)
promise.future
}
The worker processes msg, wraps the result in another Envelope, and the result gets handled back in the main thread with something like:
def handleResult(resultEnv:Envelope):Unit = {
val promise = findRegistered(resultEnv.id)
val result = resultEnv.msg
promise.success(result)
}
That needs some filling in, and some thought about what the types like R should be, but that sort of outline would probably work decently well. If this was the JVM you'd have to worry about all sorts of race conditions, but in the single-threaded JS world it probably can be as simple as using an autoincrementing integer for the request ID, and storing away the Promise...

Related

recoding a c++ task queue in rust. Is futures the right abstraction?

I am rewriting a c++ project in rust as my first non-tiny rust program. I thought I would start with a simple but key gnarly piece of code.
Its a queue of std::packaged_tasks that run at specific times. A client says
running_func_fut_ = bus_->TimerQueue().QueueTask(std::chrono::milliseconds(func_def.delay),
[this, func, &unit]()
{
func(this, &unit);
Done();
}, trace);
func is a std::function, but they key point is that as far as the queue is concerned is queuing up a lambda (closure in rust speak )
It returns a std::future which the client can ignore or can hang onto. If they hang onto it they can see if the task completed yet. (It could return a result but in my current use case the functions are all void, the client just needs to know if the task completed). All the tasks run on a single dedicated thread. The QueueTask method wraps the passed lambda up in a packaged_task and then places it in a multiset of objects that say when and what to run.
I am reading the rust docs and it seems that futures encapsulate both the callable object and the 'get me the result' mechanism.
So I think I need a BTreeSet (I need the queue sorted by launch time so I can pick the next one to run) of futures, but I am not even sure how to declare one of those. SO before I dive into the deep end of futures, is this the right approach? Is there a better , more natural, abstraction for rust?
For the output, you probably do want a Future. However, for the input, you probably want a function object (Box<dyn FnOnce(...)>); see https://doc.rust-lang.org/book/ch19-05-advanced-functions-and-closures.html.

How does NodeJS handle multi-core concurrency?

Currently I am working on a database that is updated by another java application, but need a NodeJS application to provide Restful API for website use. To maximize the performance of NodeJS application, it is clustered and running in a multi-core processor.
However, from my understanding, a clustered NodeJS application has a their own event loop on each CPU core, if so, does that mean, with cluster architect, NodeJS will have to face traditional concurrency issues like in other multi-threading architect, for example, writing to same object which is not writing protected? Or even worse, since it is multi-process running at same time, not threads within a process blocked by another...
I have been searching Internet, but seems nobody cares that at all. Can anyone explain the cluster architect of NodeJS? Thanks very much
Add on:
Just to clarify, I am using express, it is not like running multiple instances on different ports, it is actually listening on the same port, but has one process on each CPUs competing to handle requests...
the typical problem I am wondering now is: a request to update Object A base on given Object B(not finish), another request to update Object A again with given Object C (finish before first request)...then the result would base on Object B rather than C, because first request actually finishes after the second one.
This will not be problem in real single-threaded application, because second one will always be executed after first request...
The core of your question is:
NodeJS will have to face traditional concurrency issues like in other multi-threading architect, for example, writing to same object which is not writing protected?
The answer is that that scenario is usually not possible because node.js processes don't share memory. ObjectA, ObjectB and ObjectC in process A are different from ObjectA, ObjectB and ObjectC in process B. And since each process are single-threaded contention cannot happen. This is the main reason you find that there are no semaphore or mutex modules shipped with node.js. Also, there are no threading modules shipped with node.js
This also explains why "nobody cares". Because they assume it can't happen.
The problem with node.js clusters is one of caching. Because ObjectA in process A and ObjectA in process B are completely different objects, they will have completely different data. The traditional solution to this is of course not to store dynamic state in your application but to store them in the database instead (or memcache). It's also possible to implement your own cache/data synchronization scheme in your code if you want. That's how database clusters work after all.
Of course node, being a program written in C, can be easily extended in C and there are modules on npm that implement threads, mutex and shared memory. If you deliberately choose to go against node.js/javascript design philosophy then it is your responsibility to ensure nothing goes wrong.
Additional answer:
a request to update Object A base on given Object B(not finish), another request to update Object A again with given Object C (finish before first request)...then the result would base on Object B rather than C, because first request actually finishes after the second one.
This will not be problem in real single-threaded application, because second one will always be executed after first request...
First of all, let me clear up a misconception you're having. That this is not a problem for a real single-threaded application. Here's a single-threaded application in pseudocode:
function main () {
timeout = FOREVER
readFd = []
writeFd = []
databaseSock1 = socket(DATABASE_IP,DATABASE_PORT)
send(databaseSock1,UPDATE_OBJECT_B)
databaseSock2 = socket(DATABASE_IP,DATABASE_PORT)
send(databaseSock2,UPDATE_OPJECT_C)
push(readFd,databaseSock1)
push(readFd,databaseSock2)
while(1) {
event = select(readFD,writeFD,timeout)
if (event) {
for (i=0; i<length(readFD); i++) {
if (readable(readFD[i]) {
data = read(readFD[i])
if (data == OBJECT_B_UPDATED) {
update(objectA,objectB)
}
if (data == OBJECT_C_UPDATED) {
update(objectA,objectC)
}
}
}
}
}
}
As you can see, there's no threads in the program above, just asynchronous I/O using the select system call. The program above can easily be translated directly into single-threaded C or Java etc. (indeed, something similar to it is at the core of the javascript event loop).
However, if the response to UPDATE_OBJECT_C arrives before the response to UPDATE_OBJECT_B the final state would be that objectA is updated based on the value of objectB instead of objectC.
No asynchronous single-threaded program is immune to this in any language and node.js is no exception.
Note however that you don't end up in a corrupted state (though you do end up in an unexpected state). Multithreaded programs are worse off because without locks/semaphores/mutexes the call to update(objectA,objectB) can be interrupted by the call to update(objectA,objectC) and objectA will be corrupted. This is what you don't have to worry about in single-threaded apps and you won't have to worry about it in node.js.
If you need strict temporally sequential updates you still need to either wait for the first update to finish, flag the first update as invalid or generate error for the second update. Typically for web apps (like stackoverflow) an error would be returned (for example if you try to submit a comment while someone else have already updated the comments).

Synchronously request data within JavaFX thread from different thread

I've got a separate thread which needs to request some data that may change in the meantime within the JavaFX thread. I'd like to execute a blocking invocation in this separate thread that makes sure that the request becomes enqued into the JavaFX thread.
The Swing-GUI testing framework, AssertJ, provides an easy to use API for this purpose:
List list = GuiActionRunner.execute(new GuiQuery<...>...);
The invocation blocks the current thread, executes the passed code within event dispatching thread and returns the required data.
How can this be implemented in production code for JavaFX applications? What would be the recommended approach for this requirement?
Here's an alternative solution, using a FutureTask. This avoids the explicit latch and managing the synchronized data in an AtomicReference. The code here is probably simple enough that it would make including this functionality inPlatform redundant.
FutureTask<List<?>> task = new FutureTask<>( () -> {
List<?> data = ... ; // access data
return data ;
});
Platform.runLater(task);
List<?> data = task.get();
This technique is very useful if you want to pause a background thread to await user input.
Ok I think I got it now. You need to implement something like this yourself:
AtomicReference<List<?>> r = new AtomicReference<>();
CountDownLatch l = new CountDownLatch(1);
Platform.runLater( () -> {
// access data
r.set(...)
l.countDown();
})
l.await();
System.err.println(r.get());

Which response belongs to which task in a node.js threadPool?

Imagine you are going to have a lot of long processor intensive tasks of translating some strings into something else. You are going to want to have a pool of actual threads to keep the main node thread going and to make use of your cores.
The main way to do this is to either implement Threads-a-gogo or Webworker-Threads, and start a pool of 16 threads (e.g. on a Intel with 8 cores you usually have 16 threads concurrently).
Doing a request to a thread is called an event or a message. Getting a response is also catching an event or getting a message. But how does this work with a threadPool?
If you skip the Webworker API, TAGG and Webworkers for node have the same underlying API. You can load your translation function in all workers using threadPool.load and que a task to one of them using threadPool.any.
But imagine I now have 50 tasks (strings to translate) to be queued. The threadPool will eventually emit 50 events (responses with a translated string) without telling me what task the response belongs to?
I think I am fundamentally misunderstanding one thing about the threadPool.
Is there a way I can just add a task to the threadPool queue and receive a callback when that particular task is done?
Why emit events from the thread pool when you can just return the translated string? The value returned by the code is received by the callback you passed to threadpool.any.eval(). Example:
threadPool.any.eval('return "hello world"', function(err, data) {
// data === 'hello world'
});

Asynchronous IO in Scala with futures

Let's say I'm getting a (potentially big) list of images to download from some URLs. I'm using Scala, so what I would do is :
import scala.actors.Futures._
// Retrieve URLs from somewhere
val urls: List[String] = ...
// Download image (blocking operation)
val fimages: List[Future[...]] = urls.map (url => future { download url })
// Do something (display) when complete
fimages.foreach (_.foreach (display _))
I'm a bit new to Scala, so this still looks a little like magic to me :
Is this the right way to do it? Any alternatives if it is not?
If I have 100 images to download, will this create 100 threads at once, or will it use a thread pool?
Will the last instruction (display _) be executed on the main thread, and if not, how can I make sure it is?
Thanks for your advice!
Use Futures in Scala 2.10. They were joint work between the Scala team, the Akka team, and Twitter to reach a more standardized future API and implementation for use across frameworks. We just published a guide at: http://docs.scala-lang.org/overviews/core/futures.html
Beyond being completely non-blocking (by default, though we provide the ability to do managed blocking operations) and composable, Scala's 2.10 futures come with an implicit thread pool to execute your tasks on, as well as some utilities to manage time outs.
import scala.concurrent.{future, blocking, Future, Await, ExecutionContext.Implicits.global}
import scala.concurrent.duration._
// Retrieve URLs from somewhere
val urls: List[String] = ...
// Download image (blocking operation)
val imagesFuts: List[Future[...]] = urls.map {
url => future { blocking { download url } }
}
// Do something (display) when complete
val futImages: Future[List[...]] = Future.sequence(imagesFuts)
Await.result(futImages, 10 seconds).foreach(display)
Above, we first import a number of things:
future: API for creating a future.
blocking: API for managed blocking.
Future: Future companion object which contains a number of useful methods for collections of futures.
Await: singleton object used for blocking on a future (transferring its result to the current thread).
ExecutionContext.Implicits.global: the default global thread pool, a ForkJoin pool.
duration._: utilities for managing durations for time outs.
imagesFuts remains largely the same as what you originally did- the only difference here is that we use managed blocking- blocking. It notifies the thread pool that the block of code you pass to it contains long-running or blocking operations. This allows the pool to temporarily spawn new workers to make sure that it never happens that all of the workers are blocked. This is done to prevent starvation (locking up the thread pool) in blocking applications. Note that the thread pool also knows when the code in a managed blocking block is complete- so it will remove the spare worker thread at that point, which means that the pool will shrink back down to its expected size.
(If you want to absolutely prevent additional threads from ever being created, then you ought to use an AsyncIO library, such as Java's NIO library.)
Then we use the collection methods of the Future companion object to convert imagesFuts from List[Future[...]] to a Future[List[...]].
The Await object is how we can ensure that display is executed on the calling thread-- Await.result simply forces the current thread to wait until the future that it is passed is completed. (This uses managed blocking internally.)
val all = Future.traverse(urls){ url =>
val f = future(download url) /*(downloadContext)*/
f.onComplete(display)(displayContext)
f
}
Await.result(all, ...)
Use scala.concurrent.Future in 2.10, which is RC now.
which uses an implicit ExecutionContext
The new Future doc is explicit that onComplete (and foreach) may evaluate immediately if the value is available. The old actors Future does the same thing. Depending on what your requirement is for display, you can supply a suitable ExecutionContext (for instance, a single thread executor). If you just want the main thread to wait for loading to complete, traverse gives you a future to await on.
Yes, seems fine to me, but you may want to investigate more powerful twitter-util or Akka Future APIs (Scala 2.10 will have a new Future library in this style).
It uses a thread pool.
No, it won't. You need to use the standard mechanism of your GUI toolkit for this (SwingUtilities.invokeLater for Swing or Display.asyncExec for SWT). E.g.
fimages.foreach (_.foreach(im => SwingUtilities.invokeLater(new Runnable { display im })))

Resources