I am looking for the name and info on a pattern (?) that I'm contemplating. As I don't know the name, it's difficult to search for it.
Here's what I'm trying to do, with a totally hypothetical example...
find-flight ( trip-details, 'cheapest') :: flight
I have this public function, find-flight(...), that accepts 2 parameters, trip-details and some instruction to apply. It returns a single flight, not a list of them.
When I call it with 'cheapest', the function will wait for all available flight results to come in from expedia, travelocity, hotwire, etc, to assure the cheapest flight is found.
When I call it with 'shortest-flight', the function would do the same kind of underlying work as 'cheapest' but would return the shortest flight. I'm sure you can come up with other instructions to apply.
BUT! I'm specifically interested in a variant of this: the instruction (implied or not) would be 'I-am-filthy-rich-and-I-want-to-buy-a-ticket-now'. In other words, the function would call all the sources such as expedia, orbitz, etc, but would return the very first internally received result, at any price point.
I'm asking because I want my public function to be as quick as possible. I can think of a number of strategies that would make it respond fast, but I'm not sure that I know which approach would be best considering that the parameter is unknown until called, right?
So I'm thinking about writing various versions of this function that would all be called by the public version. It'd return the first result. Then, the other strategies could optionally be aborted. If I did that, I could get some metrics on the function and further optimize.
If I were to write this in Java, I'd have a bunch of future objects that the function would loop through to see which one is done first. I'd return that one.
What's that called?
It's been proposed that the pattern is called Promise Race
Related
Let foo be sub or method. I have programmed a blocking and an async variant, so looking from the outside the essential difference is in the return value. I first thought of specifying it in the signature, but the dispatcher unfortunately only looks at the incoming end instead of both:
> multi sub foo (--> Promise) {}; multi sub foo (--> Cool) {};
> my Promise $p = foo
Ambiguous call to 'foo(...)'; these signatures all match:
:( --> Promise)
:( --> Cool)
in block <unit> at <unknown file> line 1
Should I add a Bool :$async to the signature? Should I add a name suffix (i.e. foo and foo-async) like in JS? Both don't feel much perlish to me. What are the solutions currently in use for this problem?
Multiple dispatch on return type cannot work, since the return value itself could be used as the argument to a multiple dispatch call (and since nearly every operator in Perl 6 is a multiple dispatch call, this would be a very common occurrence).
As to the question at hand: considering code in core, modules, and a bunch of my own code, it seems that a given class or module will typically offer a synchronous interface or an asynchronous interface, whichever feels most natural for the problem at hand. In cases where both make sense, they are often differentiated at type or module level. For example:
Core: there are both Proc and Proc::Async, and IO::Socket::INET and IO::Socket::Async. While it's sometimes the case that a reasonable asynchronous API can be obtained by providing Promise-returning alternatives for each synchronous routine, in other cases the overall workflow will be a bit different. For example, for a synchronous socket API it's quite reasonable to sit in a loop asking for data, whereas the asynchronous API is much more naturally expressed in Perl 6 by providing a Supply of the packets arriving over the network.
Libraries: Cro::HTTP::Client offers a consistently asynchronous interface to doing HTTP requests. There is no synchronous API.
Applications: considering a lot of my application code, things seem to be either consistently synchronous or consistently asynchronous in terms of their API. The only exceptions I'm finding are classes that are almost entirely synchronous, except they have a few Supply-returning methods in order to provide notifications of events. This isn't really the case being asked about, however, since notifications are naturally asynchronous.
It's interesting that we've ended up here, in contrast to in various other languages where providing async variants through a naming convention is common. I think much of the reason is that one can use await anywhere in Perl 6. This is not the case in languages that have an async/await pair, where in order to use await one must first refactor the calling routine to be async, and then refactor its callers to be async, etc.
So if we are writing a completely synchronous bit of code and want to use something from a module that returns a Promise, our entire cost is "just write await". That's it. Writing in a call to await is the same length as a -sync or -async suffix, or a :sync or :async named argument.
On the other hand, one might choose to provide a synchronous API to something, even if on the inside it is doing an await, because the feeling is most consumers will just want to use it synchronously. Should someone wish to call it asynchronously, there's another 5-letter word, start, that will trigger it on the threadpool, and any await that is performed inside of the code will not (assuming Perl 6.d) block up a real thread, but instead just schedule it to continue when the awaited work is done. That's, again, the same length - or shorter - than writing an async suffix, named argument, etc.
Which means the pattern we seem to be ending up with (given the usual caveats about young languages, and conventions taking time to evolve) is:
For the simple case: pick the most common use case and provide that, letting the caller adapt it with start (sync -> async) or await/react (async -> sync) if they want the other thing
For more complex cases where the sync and async workflows for using the functionality might look quite different, and are both valuable: provide them separately. Granted, one may be a facade of the other (for example, Proc in core is actually just a synchronous adaptation layer over Proc::Async).
A final observation I'd make is that individual consumers of a module will almost certainly be using it synchronously or asynchronously, not a mixture of the two. If wishing to provide both, then I'd probably instead look to using export tags, so I can do:
use Some::Thing :async;
say await something();
Or:
use Some::Thing :sync;
say something();
And not have to declare which I want upon each call.
What's the syntax for accessing a subroutine Capture once it's called? self only works for objects, and &?ROUTINE refers to the static routine, not its state once called. So first, is it possible to access the routine's Capture from inside? If so, what's the syntax for accessing it? I've looked at the related Synopse but I can't find a way, if there's one.
There's no way to do exactly what you're asking for. While conceptually arguments are passed by forming a Capture object holding them, which is then unpacked by a signature, for most calls no Capture ever really exists. With every operator in Perl 6 being a multi-dispatch subroutine call, the performance of calling is important, and the language design is such that there's plenty of room for implementations to cheat in order to achieve acceptable performance.
It is possible to explicitly ask for a Capture, however:
sub foo(|c ($a, $b)) { say c.perl; }
foo(1, 2);
This will capture the arguments into c and then unpack them also into $a and $b, enforcing that inner signature.
One might realize that things like callsame do indeed find a way to access the arguments to pass them on, even though no Capture appears in the signature. Their need to do so causes the compiler to opt any routine containing a callsame out of various optimizations, which would otherwise discard information needed to discover the arguments. This isn't ideal, and it's probable it will change in the future - most likely by finding a way to sneak a |SECRET-CAPTURE into the signature or similar.
In his article "Why Functional Programming Matters," John Hughes argues that "Lazy evaluation is perhaps the most powerful tool for modularization in the functional programmer's repertoire." To do so, he provides an example like this:
Suppose you have two functions, "infiniteLoop" and "terminationCondition." You can do the following:
terminationCondition(infiniteLoop input)
Lazy evaluation, in Hughes' words "allows termination conditions to be separated from loop bodies." This is definitely true, since "terminationCondition" using lazy evaluation here means this condition can be defined outside the loop -- infiniteLoop will stop executing when terminationCondition stops asking for data.
But couldn't higher-order functions achieve the same thing as follows?
infiniteLoop(input, terminationCondition)
How does lazy evaluation provide modularization here that's not provided by higher-order functions?
Yes you could use a passed in termination check, but for that to work the author of infiniteLoop would have had to forsee the possibility of wanting to terminate the loop with that sort of condition, and hardwire a call to the termination condition into their function.
And even if the specific condition can be passed in as a function, the "shape" of it is predetermined by the author of infiniteLoop. What if they give me a termination condition "slot" that is called on each element, but I need access to the last several elements to check some sort of convergence condition? Maybe for a simple sequence generator you could come up with "the most general possible" termination condition type, but it's not obvious how to do so and remain efficient and easy to use. Do I repeatedly pass the entire sequence so far into the termination condition, in case that's what it's checking? Do I force my callers to wrap their simple termination conditions up in a more complicated package so they fit the most general condition type?
The callers certainly have to know exactly how the termination condition is called in order to supply a correct condition. That could be quite a bit of dependence on this specific implementation. If they switch to a different implementation of infiniteLoop written by another third party, how likely is it that exactly the same design for the termination condition would be used? With a lazy infiniteLoop, I can drop in any implementation that is supposed to produce the same sequence.
And what if infiniteLoop isn't a simple sequence generator, but actually generates a more complex infinite data structure, like a tree? If all the branches of the tree are independently recursively generated (think of a move tree for a game like chess) it could make sense to cut different branches at different depths, based on all sorts of conditions on the information generated thus far.
If the original author didn't prepare (either specifically for my use case or for a sufficiently general class of use cases), I'm out of luck. The author of the lazy infiniteLoop can just write it the natural way, and let each individual caller lazily explore what they want; neither has to know much about the other at all.
Furthermore, what if the decision to stop lazily exploring the infinite output is actually interleaved with (and dependent on) the computation the caller is doing with that output? Think of the chess move tree again; how far I want to explore one branch of the tree could easily depend on my evaluation of the best option I've found in other branches of the tree. So either I do my traversal and calculation twice (once in the termination condition to return a flag telling infinteLoop to stop, and then once again with the finite output so I can actually have my result), or the author of infiniteLoop had to prepare for not just a termination condition, but a complicated function that also gets to return output (so that I can push my entire computation inside the "termination condition").
Taken to extremes, I could explore the output and calculate some results, display them to a user and get input, and then continue exploring the data structure (without recalling infiniteLoop based on the user's input). The original author of the lazy infiniteLoop need have no idea that I would ever think of doing such a thing, and it will still work. If we've got purity enforced by the type system, then that would be impossible with the passed-in termination condition approach unless the whole infiniteLoop was allowed to have side effects if the termination condition needs to (say by giving the whole thing a monadic interface).
In short, to allow the same flexibility you'd get with lazy evaluation by using a strict infiniteLoop that takes higher order functions to control it can be a large amount of extra complexity for both the author of infiniteLoop and its caller (unless a variety of simpler wrappers are exposed, and one of them matches the caller's use case). Lazy evaluation can allow producers and consumers to be almost completely decoupled, while still giving the consumer the ability to control how much output the producer generates. Everything you can do that way you can do with extra function arguments as you say, but it requires to the producer and consumer to essentially agree on a protocol for how the control functions work; and that protocol is almost always either specialised to the use case at hand (tying the consumer and producer together) or so complicated in order to be fully-general that the producer and consumer are up tied to that protocol, which is unlikely to be recreated elsewhere, and so they're still tied together.
I see the following approach often when working on certain projects that use Node.js and Bluebird.js:
function someAsyncOp(arg) {
return somethingAsync(arg).then(function (results) {
return somethingElseAsync(results);
});
}
This is, creating a wrapper function/closure around another function that accepts the exact same arguments. It seems this could be written more cleanly as:
function someAsyncOp(arg) {
return somethingAsync(arg).then(somethingElseAsync);
}
When I propose it to others, they usually like it and switch to it.
There is, however, an important caveat: if you're calling something like object.function, and the function relies on this (like console.log does), then this will lose its binding. You have to do object.function.bind(object):
return somethingAsync(arg).then(somethingElseAsync).catch(console.log.bind(console));
This does seem potentially undesirable, and the .bind call feels a little awkward. You can't go wrong with the let's-always-do-the-closure approach.
I can't seem to find any discussion on this on google, there doesn't seem to be anything in ESLint about unnecessary wrapper functions. I'm trying to find out more about it so here I am. I guess it's a case of I don't know what I don't know. Is there a name for this? (Useless use of closures?) Any other thoughts or wisdoms? Thank you.
Edit: someone's going to comment that someAsyncOp is also redundant, yes, it is, let's pretend it does something useful.
The discussion here is pretty straightforward. If your function is OK being called directly by the promise system, with the exact arguments and this value that will be in place when its called directly by the promise system and its return value is exactly what you want in the promise chain, then by all means, just specify the function reference directly as the .then() handler:
somethingAsync(arg).then(somethingElseAsync)
But, if your function isn't set up to be called directly that way, then you need a wrapper function or something like .bind() to fix the mismatch and call your function exactly as you want or set up the proper return value.
There's really nothing more to it than that. It's no different than specifying any callback anywhere in Javascript. If you have a function that already meets the specs of the callback exactly, then you can specify that function name as a direct reference with no wrapper. But, if the function you have doesn't quite work the way the callback is designed to work, then you use a wrapper function to smooth over the mismatch.
All callback functions have the same issue with passing obj.method as the callback. If your .method expects the this value to be obj, then you will probably have to do something to make sure that the this value is set accordingly before your function executes. The callbacks in .then() handlers are no different than callbacks for any other Javascript/node.js function such as setTimeout() or fs.readFile() or another other function that takes a callback as an argument. So, neither of the issues you mention is unique to promises at all. It just so happens that promises live by callbacks so if you're trying to make method calls via a callback, you will run into the issue with the object value getting passed appropriately to the method.
FYI, it is possible to code methods so that they are permanently bound to their own object and can be passed as obj.method, but that can only be used in your method implementation and has some other tradeoffs. In general, experienced Javascript developers are perfectly fine using obj.method.bind(obj) as the reference to pass. Seeing the .bind() in the code also indicates that you're aware that you need the proper obj value inside the method and that you have made a provision for that.
As for some of your bolded questions or comments:
Is there a name for this?
Not that I'm aware of. Technically it's "passing a named reference to a previously defined function as a callback", but I doubt that's something you can search for and find useful discussion of.
Any other thoughts or wisdoms?
For reasons, I'm not entirely sure of (though has been a topic of discussion elsewhere), Javascript programming style conventions seem to encourage the use of anonymous inline callbacks rather than defining a method or function elsewhere and then passing that named reference (like you would be more likely to do in many other languages). Obviously, if you put the actual code to process the callback in an inline anonymous function, then neither of the issues you mention comes up. Using arrow functions in ES6 now even allows you to preserve the current value of this in the inline callback. I'm not saying that this is an answer to your question just an observation about common Javascript coding conventions.
You can't go wrong with the let's-always-do-the-closure approach.
As you seem to already know, it's a waste to wrap something if it doesn't need wrapping. I would vote for wrapping only when there's a mismatch between the specification for the callback and the already existing named function and there's a reason not to just fix the named function to match the specification of the callback.
I came across the following post however had a hard time understanding it - self message(non recursive) vs self recursive message.
I also came across the example at http://www.zicomi.com/messageRecursion.jsp hoping a real world scenario would help but that confused me further. Why would you need a recursive message when the order has been passed to the kitchen and chef. I would thought all you would have needed is a self message i.e. the chef completing the order and then passing it to the waiter.
The chef example is arguably "wrong" in what it shows and describes.
Simply put, a message to self just means that the method that is to be invoked next happens to be in the same class of objects. E.g. a call to SavingsAccount.withdraw(anAmount) may call SavingsAccount.getBalance() to determine if there are enough funds to continue with the withdrawal.
A recursive call is a special case of a call to self in that it is to the same method (with a different state, so that it can eventually return out of the recursive calls). Some problems lend themselves to this solution. An example is factorial (see Factorial). To do a factorial without recursion would be impossible (at least for all cases but the simplest, due to the volume of inline code needed). If you look in the factorial code example, you'll see that the call is changed by one each time (factorial(n-1)) and stops when n reaches zero. Trying to do this inline for a value like 1,000,000 would not be feasible without recursion.