Why do coroutines have futures? - rust

Once you have coroutines you can create pipelines (haskell: pipes, conduits; python: generators) or cooperative event loops (python: curio). Once you have futures, it appears you can do the same; pipelines (rust: futures-rs) and event loops (rust: tokio). Since futures aren't cooperative they require a callback-based (even poll-based futures require callbacks) scheduler to execute blocking tasks within a thread or process pool. What benefits are there to combining futures (library-level) with coroutines (language-level) as these languages do: (python: asyncio), (rust: rfc), (ecmascript 6+). Fundamentally they seem to be conflicting solutions to the same problem.
I'm not looking for a pro/con comparison, and I don't buy the argument that futures are "one-shot" coroutines. Just look at rust, which built an entire state-machine-based event framework using just futures. I want to know why python/asyncio and javascript both require coroutines together with futures. Why rust is planning on adding coroutines to its futures? Does it have to do with composability of events? Or the implicit stack of coroutines versus the explicit stack of continuation-passing futures? Not that I completely understand this argument, as both futures and coroutines are implemented using continuations... Or does it have something to do with direct vs indirect style?

These are all different (though related) ideas with different amounts of power.
A future is an abstraction that lets you begin a process and then yield back to a handler, that is chosen by the original caller, when the process is done.
A generator is more powerful than a future because it can yield multiple times. You can implement futures on top of generators.
A coroutine is more powerful than a generator because it can choose who to yield to, instead of only to the caller. For example it can yield to another coroutine. You can implement generators on top of coroutines.
Why would you use the less powerful tool, when more powerful ones are available? Sometimes the less powerful tool is the right tool for the job. It's useful to statically encode your program's invariants using types, because it can give you certainty about what something can't do.
For example, when making a REST call to a remote server, a future is probably sufficient. If the REST client exposed a generator, you'd have to deal with the possibility that it could yield multiple times, even though you know there is only going to be one result. If it exposed a coroutine, you'd have to consult the documentation to work out exactly how you're supposed to interact with it - even though you actually only need to do one thing, which is obvious when you're dealing with a future.

Related

Is there any area where Threads should be favored over Coroutines?

I just gave a talk about Kotlin's Coroutines and the question arised if Coroutines can always replace Threads or if there also might be disadvantages.
Or the other way round: Is there any area where Coroutines should not be used for?
Coroutines are useful for asynchronous programming. When you are writing code that has to wait most of the time for some external events, like it often happens in modern connected user-interfaces and micro-service-oriented backend applications, then coroutines and the concept of Kotlin suspending functions let you write naturally looking and easy-to-understand code that is more scalable than the code with explicit threads.
If you are writing some kind of computation, CPU-intensive code, then you'd find that classical patterns of multithread-programming and parallelism work better.
It does not mean that you cannot use coroutines to parallelize some piece of CPU-intensive application, but you will not get any benefits in either code readability or its performance from doing so.

Is Parallel.ForEach obsolete. old, out of fashion?

Good day,
Parallel execution can be achieved through multiple ways. From strictly manual "multithreading" to using various "helpers" created by Microsoft. One of those helpers is the Parallel class.
One of my colleagues is insisting that Parallel.ForEach (or Parallel class overall) is "old" and shouldn't be used. Instead, he says, one should use async operations. In other words you should use Task.WhenAll() instead of Parallel.ForEach() .
When asked why not use Parallel.ForEach(), when this is exactly what was needed - multiple expensive operations executed in parallel, he replied that Parallel.ForEach() is old and that Microsoft recommends using async/await wherever possible.
I searched all over MSDN and Stackoverflow and everywhere I could but I couldn't find anything pointing at the necessity of using async/await instead of .Parallel. While you often can achieve similar results by interchanging these tools it doesn't mean that Parallel.ForEach is obsolete. Or does it?
Anyone has a link to some "best practices" or "recommendations" by a reputable body (MSDN?) that would say that Parallel.ForEach() is being phased out and one needs to stick with creating, running and awaiting tasks?
Please do not post answers related to Parallel VS Async for this as this is not the question.
The question is: Since you can make tasks run in parallel using async/await WhenAll (WaitAll etc.), does it make 'Parallel' class obsolete, old, or not fashionable in .NET 4.5 onward?
I don't think Parallel.ForEach is obsolete.
Ever since introducing the Task Parallel Library (TPL) with .NET 4, Microsoft has distinguished between "Data Parallelism" (e.g. Parallel.ForEach) and "Task Parallelism" (Task). From MSDN:
"Data parallelism refers to scenarios in which the same operation is performed concurrently (that is, in parallel) on elements in a source collection or array."
"[T]ask parallelism refers to one or more independent tasks running concurrently."
(Emphasis by me. Like dcastro commented (above): "Your friend is confusing parallelism with asynchrony.")
These two types of parallelism/concurrency pursue different goals, so the TPL offers different capabilities for each of them.
Conceptually, Task.WhenAll belongs into the task parallelism category, so I don't think it obsolesces something that belongs to the other (data parallelism) category.
Parallel.ForEach (and PLINQ as a whole) has abilities that are not available in the async language support.
For example, you can limit the degree of parallelism (eg. 100 items to process, but do no more that 10 at a time). Thus it is not obsolete.
Fundamentally async is about making concurrent operations – without any assumptions of threading – easier to write. PLINQ is about computation making use of many cores.
I suspect your colleague is reading too much into direct use of Task Parallel Library (TPL) largely being unnecessary with async in the language (except for the return type of async functions). But PLINQ was always a different layer over TPL. If anything PLINQ and async are two separate ways to make use of TPL for different purposes.
async and await have nothing at all to do with parallelism. They are technologies used to make existing asynchronous APIs easier to consume and expose. async and await do not initiate parallelism or concurrency. In fact await ends parallelism by waiting for something that is already running.
Parallel.ForEach is used to process a set of homogeneous items in the same way on multiple cores. You can simulate Parallel.ForEach by spawning a big number of tasks. There is no advantage in doing that. In fact it introduces inefficiencies and obfuscates the code. It is possible and works but it is an inferior way of doing things if Parallel.ForEach is applicable.
I think your colleague does not understand that await really just waits. It does not start something.
Use Parallel.* and PLINQ (mostly) for CPU-bound work.

node.js modules: Async vs Fibers.promise vs Q_oper8

Just wondering if anyone could give me a comparison of trade-offs between these modules for handling async events. Specifically, I'm interested in knowing about reasons to use Async instead of Fibers.promise, which I am using quite extensively at least in my test code right now. In particular, one of the major pluses I see in Fibers.promise is that I can keep the stack chain front bifurcating, making it possible to use try { } catch { } finally, and also allowing me to ensure that after a request has been handled that the response is ended.
Is anyone using Q_oper8? I found this on another page and was just wondering if that's already dead or if its something I should check out.
I've never heard of Q_oper8, so I can't comment on it, but I'll come at this from the other direction. I heard about async first and Fiber (and its helper libraries) second, and I don't like the latter, actually.
The Downsides of Fiber
Unfamiliarity for other Javascript developers
Fiber introduces the concept of co-routines to Javascript via a compiled Fiber native method that takes over the interpretation of the Javascript code passed to it, intercepting calls to yield to jump back to the waiting co-routine.
This may not matter to you, but if you need to work on a team, you'll have to teach the concept to your members (or hope they have experience with the concept from other languages, like Go).
No Windows Support
So, in order to use Fiber or any of the libraries written on top of it, you'll have to natively compile it for your platform first. I don't use Windows, but note that Fiber is not supported on Windows, so that restricts the utility of your own library off-the-bat. Which means you won't be finding general-purpose Node.js libraries written in Fiber at all (and you probably wouldn't have, anyways, since it adds a costly compilation step that you'd otherwise avoid with async).
Browser Incompatible
This means any code you write using Fiber will not be able to run in the browser, because you can't mix native code with the browser (nor would I as a browser user want you to), even if everything you write is "Javascript" (it's syntatically Javascript, but semantically not).
More Difficult Debugging
While the "callback hell" may be less visually pleasing, Continuation-Passing Style does have one very good thing going for it over Co-Routines -- you know exactly where a problem has occurred from the call stack and can trace backwards. Co-Routines enter the function at more than one point in the program, and can exit from three kinds of calls: return, throw and yield(), where the latter is also a return point.
With co-routines, you have cross-execution between two or more functions running "simultaneously", and you may have more than one set of co-routines running at the same time on the event loop. With traditional callbacks, you're guaranteed that the outer scope of the function is static during the execution of said function, so you only need to check those outer variables once if they're needed. Co-routines need these checks to be run after every yield() (since it's usage with the originating co-routine would be translated into a callback chain in real Javascript).
Basically, I think the co-routine concept is made more difficult to work with because it has to exist inside of the Javascript event loop, rather than being a method to implement one.
What makes Async "better"?
Worse is Better
It's sort of the "worse-is-better" idea, actually. Rather than extend the Javascript language to try and get rid of its warts (and create new ones, in my opinion), Async is a pure-Javascript solution to cover them up, like makeup.
Control flow explicit
The Async functions describe different types of logic flow that needs to cross the event loop barrier, and the library covers up the implementation details of the callback code needed to implement that logic, and you just provide it functions it should run in roughly the linear order they will execute across the event loop.
If you're willing to drop the first indentation level around the async methods' arguments, you have no extra indentation versus Co-Routines and only a minor number of extra lines of function(callback) { declarations, like this:
var async = require('async');
var someArray = [1, 2, 3, 4, 5, 6, 7, 8, 9];
async.forEach(someArray,
function(number, callback) {
//Do something with the number
callback();
}, function(err) {
//Done doing stuff, or one of the calls to the previous function returned an error I need to deal with
});
In this case, you know that all of the variables your code is using could only have been changed before your code is run if they weren't changed by your code, so you can debug easier, and there is only one "return" mechanism: callback(). You either callback with nothing on success or pass the callback an error when something's gone wrong.
Code reuse not difficult
The above example makes code reuse difficult but it doesn't have to be. You can always pass in named functions as the parameters:
var async = require('async');
// Javascript doesn't care about declaration order within a scope,
// so order the declarations in a way that's most readable to you
async.forEach(someArray, frazzleNumber, doneFrazzling);
var someArray = [1, 2, 3, 4, 5, 6, 7, 8, 9];
function frazzleNumber(number, callback) {
// Do something to number
callback();
}
function doneFrazzling(err) {
// Do something or handle error
}
Functional, not imperative
The async module discourages the use of imperative-style flow control and encourages (requires, for the parts that cross the event loop) the use of functions for flow control.
The advantage of the functional style is that you can easily re-use the body of your loop or your conditional, and that you can create new control flow "verbs" that better match the flow of your code (demonstrated by the very existence of the async library), like the async.auto control flow method that implements dependency graph resolution for function call order. (You specify a series of named functions and list the other functions, if any, that it depends on to execute, and auto runs first the "independent" functions then the next function that can run based on when its dependent functions have finished running.)
Rather than writing your code to fit the imperative style dictated by your language, you write your code as the logic of the problem dictates, and implement the "glue" control flow to get it to happen.
In Summary
Fiber, by its very nature of extending the Javascript language, cannot develop a large ecosystem within Node.js, especially when Async gets 80% of the way on the looks department, and has none of the other downsides of co-routines in Javascript.
The short answer:
Async is a pure/classic javascript solution to managing single-thread asynchronousity
Fibers is a node.js extension for creating coroutines. It includes a futures library for managing single-thread asynchronousity.
There are many other futures libraries (listed below) that don't require an extension of javascript.
Q_oper8 is a node.js module for managing multi-process concurrency
Note that none of these offer "threads" and so none can be said to do multithreading (though there is a node.js extension for that too: threads_a_gogo).
Async vs Fiber/futures
Async
Async and Fibers/futures are different ways to solve the same problem: managing asynchronously resolving dependencies. Async seems to have many more "bells and whistles" than many other libraries that try to solve this problem, which in my opinion makes it worse (much more cognitive overhead - ie more crap to learn).
In javascript basic asynchronisity looks like this:
asyncCall(someParam, function(result) {
useThe(result);
});
If you have a situation that requires more than just basic asynchronisity, like where you need the results of two asyncronous calls, you might do something like this:
asyncCall1(someParam, function(result0) {
asyncCall2(someParam, function(result1) {
use(result0, result1);
}
});
Already starts to look like callback hell. Also its inefficient because the second call is waiting for the first call to complete even though it isn't dependent on it, not to mention the code doesn't even do any sort of reasonable error handling. Async provides one solution to writing it a little more efficiently:
async.parallel([
function(callback) {
asyncCall1(someParam, function(result0) {
callback(null,result0);
},
function(callback) {
asyncCall1(someParam, function(result1) {
callback(null,result1);
},
}
],
function(err, results) {
use(results[0], results[1]);
});
So to me, thats rather worse than callback hell, but to each his own I suppose. Despite it being ugly, it allows both calls to happen simultaneously (as long as they make non-blocking IO calls or something like that). Async has many more options for managing asynchronous code, so if you're interested take a look at the documentation.
Enter fiber/futures
The coroutines the Fibers module includes a futures library that uses coroutines to re-inject asynchronous events back into the current continuation (future.wait()).
Fibers is different from most other futures libraries because it allows the current continuation to wait on an asynchronous event - meaning it doesn't require the use of callbacks in order for you to get a value back from an async request - allowing asynchronous code to become synchronous-like. Read about coroutines for more about that.
Node.js has io functions like readFileSync, which lets you wait on the function in-line while it gets the file for you. This is not something that is normally done in javascript, and isn't something that can be written in pure javascript - it requires an extension like Fibers.
Going back to the same asynchronous example above, this is what it would look like with fibers/futures:
var future0 = asyncCall1(someParam);
var future1 = asyncCall2(someParam);
use(future0.wait(), future1.wait());
This is drastically simpler and just as efficient as the Async mess up there. It avoids callback-hell in an elegant efficient way. There are (minor) downsides though. David Ellis overstated many of the downsides, so I'll repeat the only valid one here:
Browser Incompatibility
By virtue of Fibers being a node.js extension, it will not be compatible with browsers. This will make sharing code that uses fibers impossible with both a node.js server and the browser. However, there is a strong argument that most asynchronous code you want on the server (filesystem, database, network calls) is not the same code you want on a browser (ajax calls). Maybe timeouts collide, but that seems like it.
Beyond that, the streamline.js project has the ability to bridge this gap. Seems like it has a compilation process that can transform streamline.js code using synchronization and futures into pure javascript using the callback style, similar to the now unsupported Narrative Javascript. Streamline.js can use a couple different mechanisms behind the scenes, one being node.js Fibers, another being ECMAScript 6 generators, and the last being translation into callback-style javascript which I already mentioned.
More difficult debugging
This one seems like a valid, if minor, gripe. Even if you're just planning on using fibers/futures, and not using coroutines for anything else, there might still be confusing context switches because of unexpected function exit (and entrance) points.
Introduces pre-emptiveness into javascript
This is probably the most major problem with fibers, since it has the possibility (however unlikely) of introducing hard-to-understand bugs. Basically, because a Fiber yield can cause a temporary exit of a set of code to another undetermined function, its possible that some invalid state can be read or introduced. See this article for more info. Personally, I think the incredible cleanness of fibers/futures and similar structures is well worth the rare insidious bugs. Many more bugs are caused by awful concurrency code.
Invalid gripes
Not on windows: this just isn't true anymore
Unfamiliarity with coroutines: A. Unfamiliarity is never a reason to shun something. If its good its good, regardless of how familiar you are with it. B. While coroutines and yields may be unfamiliar, futures are an easy concept to understand.
Other futures libraries
There are many libraries that implement futures, where the concept may be called "futures", "deferred objects", or "promises". This includes libraries like async-future, streamline.js, Q, when.js, promiscuous, jQuery's deferred, coolaj86's futures, kriszyp's promises, and Narrative Javascript.
Most of these use callbacks to resolve the futures, which get around many of the problems Fibers introduces. However, they aren't quite as clean as fibers/futures, tho they are far cleaner than Async. Here's the same example again using my own async-future:
var future0 = asyncCall1(someParam);
var future1 = asyncCall2(someParam);
Future.all([future0, future1]).then(function(results) {
use(results[0], results[1])
}).done()
Q_oper8
Q_oper8 is really a different beast. It runs jobs in a queue using a pool of processes. Since javascript is single-threaded*, and javascript doesn't have native threading available, processes are the usual way to take advantage of more than one processor in node.js. Q_oper8 is intended as an alternative to managing processes using node.js's child_process module.
You should also check out Step.
It handles only a small subset of what async can do, but I think the code is much easier to read. It's great for just handling the normal case of doing a sequence of things, with some of those things happening in parallel.
I tend to use Step for the bulk of my logic, and then use async occasionally when I need to apply methods repeatedly in serial or parallel execution (ie - call this function until, or call this function on each element of this array).
I'm using jQuery's Deferred functionality on the client and jQuery Deferred for nodejs on the server in place of nested callbacks. It has greatly reduced the code and made things so readable.
http://techishard.wordpress.com/2012/05/23/promises-promises-a-concise-pattern-for-getting-and-showing-my-json-array-with-jquery-and-underscore/
http://techishard.wordpress.com/2012/05/29/making-mongoose-keep-its-promises-on-the-server/

Implementing "Generator" support in a custom language

I've got a bit of fettish for language design and I'm currently playing around with my own hobby language. (http://rogeralsing.com/2010/04/14/playing-with-plastic/)
One thing that really makes my mind bleed is "generators" and the "yield" keyword.
I know C# uses AST transformation to transform enumerator methods into statemachines.
But how does it work in other languages?
Is there any way to get generator support in a language w/o AST transformation?
e.g. Does languages like Python or Ruby resort to AST transformations to solve this to?
(The question is how generators are implemented under the hood in different languages, not how to write a generator in one of them)
Generators are basically semi-coroutines with some annoying limitations. So, obviously, you can implement them using semi-coroutines (and full coroutines, of course).
If you don't have coroutines, you can use any of the other universal control flow constructs. There are a lot of control flow constructs that are "universal" in the sense that every control flow construct (including all the other universal control flow constructs), including coroutines and thus generators can be (more or less) trivially transformed into only that universal construct.
The most well-known of those is probably GOTO. With just GOTO, you can build any other control flow construct: IF-THEN-ELSE, WHILE, FOR, REPEAT-UNTIL, FOREACH, exceptions, threads, subroutine calls, method calls, function calls and so on, and of course also coroutines and generators.
Almost all CPUs support GOTO (although in a CPU, they usually call it jmp). In fact, in many CPUs, GOTO is the only control flow construct, although today native support for at least subroutine calls (call) and maybe some primitive form of exception handling and/or concurrency primitive (compare-and-swap) are usually also built in.
Another well-known control flow primitive are continuations. Continuations are basically a more structured, better manageable and less evil variant of GOTO, especially popular in functional languages. But there also some low-level languages that base their control flow on continuations, for example the Parrot Virtual Machine uses continuations for control flow and I believe there are even some continuation-based CPUs in some research lab somewhere.
C has a sort-of "crappy" form of continuations (setjmp and longjmp), that are much less powerful and less easy to use than "real" continuations, but they are plenty powerful enough to implement generators (and in fact, can be used to implement full continuations).
On a Unix platform, setcontext can be used as a more powerful and higher level alternative to setjmp/longjmp.
Another control flow construct that is well-known, but doesn't probably spring to mind as a low-level substrate build other control flow constructs on top of, are exceptions. There is a paper that shows that exceptions can be more powerful than continuations, thus making exceptions essentially equivalent to GOTO and thus universally powerful. And, in fact, exceptions are sometimes used as universal control flow constructs: the Microsoft Volta project, which compiled .NET bytecode to JavaScript, used JavaScript exceptions to implement .NET threads and generators.
Not universal, but probably powerful enough to implement generators is just plain tail call optimization. (I might be wrong, though. I unfortunately don't have a proof.) I think you can transform a generator into a set of mutually tail-recursive functions. I know that state machines can be implemented using tail calls, so I'm pretty sure generators can, too, since, after all, C# implements generators as state machines. (I think this works especially well together with lazy evaluation.)
Last but not least, in a language with a reified call stack (like most Smalltalks for example), you can build pretty much any kind of control flow constructs you want. (In fact, a reified call stack is basically the procedural low-level equivalent to the functional high-level continuation.)
So, what do other implementations of generators look like?
Lua doesn't have generators per se, but it has full asymmetric coroutines. The main C implementation uses setjmp/longjmp to implement them.
Ruby also doesn't have generators per se, but it has Enumerators, that can be used as generators. Enumerators are not part of the language, they are a library feature. MRI implements Enumerators using continuations, which in turn are implemented using setjmp/longjmp. YARV implements Enumerators using Fibers (which is how Ruby spells "coroutines"), and those are implemented using setjmp/longjmp. I believe JRuby currently implements Enumerators using threads, but they want to switch to something better as soon as the JVM gains some better control flow constructs.
Python has generators that are actually more or less full-blown coroutines. CPython implements them using setjmp/longjmp.

producer-consumer using assignment

I am interested in finding if producer-consumer problem when there are multiple produce and multiple consumer be solved without using assignment i.e., using functional style of programming? How?
Producer-consumer problem
Thanks
Yes, you can do it quite nicely with message passing in Concurrent ML. Don't be put off by the age of the system; John Reppy's book and papers are excellent guides to the topic. Beautiful stuff!
Having multiple threads necessarily requires impure (non-functional) actions. Pure functional programming considers your application to be the evaluation of a function. The concept of concurrently evaluating two things and passing data between them is not meaningful within this framework.
Although one can evaluate multiple parts of a function in parallel, as in Haskell's ``par operator, this is not the same as the producer-consumer problem, and as such I don't think you'll be able to solve it in a functional way.
Yes. Check out functional reactive programming (FRP), which is related to Concurrent ML (Norman's suggestion) but is purely functional. The semantics of FRP is highly "concurrent" while having a simple, precise, deterministic, functional semantic model (functions of time).
Edit: I quoted "concurrent" here, because I do not mean the usual operational (implementation-oriented) notion of concurrency, which is imperative and non-deterministic, thus impeding practical & dependably correct reasoning.
All of the implementations I've seen of producer-consumer in SML were forced to rely on refs (in order to maintain a queue of 'sleeping' items), so I'd be inclined to say "no".
There are many way to solve this; each has different drawbacks.
For example, the "put" could spawn a new thread each time. That way, you wouldn't need a buffer at all. If lots of requests come in, you spawn lots of threads until your CPU is more busy with switching between threads than actually executing them. But this just moves the problem from your code into the OS: At a certain point, you always have to synchronize access to a variable in memory. The OS must maintain a list of threads and access to this list must be synchronized.
Either you want to limit the number of threads (then a "put" must be able to read the variable while threads might terminate at the same time and decrement it -> again synchronized access). Or you risk running out of resources because you have too many threads.
You could post a message when "put" is called and the consumers could listen to the message. But that's only a complex way to implement "wait" for threads. And you need a way to make sure that only a single consumer gets the message. Again, you'll need some synchronized data structure.
So in the end, it's not really the assignment, which is the problem, but concurrent access to a single variable and no matter how you try, for any implementation of produce-consumer, you must be able to do this (or the whole will be single threaded).

Resources