Is it usual for interaction nets to leave piles of redundant fans? - haskell

I'm compiling lambda calculus terms to interaction nets in order to evaluate them using Lamping's abstract algorithm. In order to test my implementation, I used this church-number division function:
div = (λ a b c d . (b (λ e . (e d)) (a (b (λ e f g . (e (λ h . (f h g)))) (λ e . e) (λ e f . (f (c e)))) (b (λ e f . e) (λ e . e) (λ e . e)))))
Dividing 4 by 4 (that is, (λ k . (div k k)) (λ f x . (f (f (f (f x)))))), I get this net:
(Sorry for the awful rendering. λ is a lambda, R is root, D is a fan, e is eraser.)
Reading this term back, I get the church number 1, as expected. But this net is very inflated: it has a lot of fans and erasers that serve no obvious purpose. Dividing bigger numbers is even worse. Here is div 32 32:
This again reads back as one, but here we can see an even longer tail of redundant fan nodes. My question is: is this an expected behavior of interaction needs when reducing that particular term or is this a possible bug on my implementation? If this isn't a bug, is there any way around that?

Yes, it is usual (but there are techniques to lessen its presence)
Abstracting from some details of your implementation with Interaction Nets,
and also from your hypothesis of soundness of the abstract algorithm for your div,
everything seems just fine to me.
No further interaction can be applied to the output you show, in spite of chi's claim, because none of the pairs D-e can interact through their principal port.
This latter kind of reduction rule (that is not allowed by IN framework) may improve efficiency and it is also sound in some particular cases.
Basically, the involved fan must not have any "twin", i.e. there must exists no D' in the net such that eventually the annihilation D-D' can happen.
For more details, look at The optimal implementation of functional programming language, chapter Safe nodes (which is available online!), or at the original paper from which that came:
Asperti, Andrea, and Juliusz Chroboczek. "Safe Operators: Brackets Closed Forever Optimizing Optimal λ-Calculus Implementations." Applicable Algebra in Engineering, Communication and Computing 8.6 (1997): 437-468.
Finally, the read-back procedure must be intended not as some sort of external cost for your reduction precedure, but rather as a deferred cost of computing duplication and erasure.
As you notice, such a cost is rarely negligible, so if you want to test efficiency in a real-world scenario, always sum up both sharing reduction and read-back reduction.

Related

Haskell why using # (as)

Came across this example:
buildEntry ws#(w:_) = (w, length ws)
What's the advantage of using ws#(w:_) instead of:
buildEntry' (w:ws) = (w, length (w:ws))
Since I am only a beginner, I think readability of the second example is also better.
Readability is not always improved by removing the as-pattern. Compare
foo x#(C a b (D c d) (E e f)) = bar x
with
foo (C a b (D c d) (E e f)) = bar (C a b (D c d) (E e f))
I think the first one is more readable.
Further, there might be some performance differences. As far as I know, using x#(C ...) and then referring to x later will make GHC to define x as a pointer to the same memory cell holding (C ...). Instead, using (C ...) with the same arguments can allocate a new "object" which is a copy of the original one.
Of course, GHC might apply a sort of CSE (common subexpression elimination) to keep only a copy around. However, since CSE is not always beneficial w.r.t. performance, GHC is quite conservative on using CSE.
Right now, I cannot see any issues in doing that optimization in this specific case, so maybe GHC could use CSE after all. Still, I'm unsure about this.

Sharing vs. non-sharing fixed-point combinator

This is the usual definition of the fixed-point combinator in Haskell:
fix :: (a -> a) -> a
fix f = let x = f x in x
On https://wiki.haskell.org/Prime_numbers, they define a different fixed-point combinator:
_Y :: (t -> t) -> t
_Y g = g (_Y g) -- multistage, non-sharing, g (g (g (g ...)))
-- g (let x = g x in x) -- two g stages, sharing
_Y is a non-sharing fixpoint combinator, here arranging for a recursive "telescoping" multistage primes production (a tower of producers).
What exactly does this mean? What is "sharing" vs. "non-sharing" in that context? How does _Y differ from fix?
"Sharing" means f x re-uses the x that it creates; but with _Y g = g . g . g . g . ..., each g calculates its output anew (cf. this and this).
In that context, the sharing version has much worse memory usage, leads to a space leak.1
The definition of _Y mirrors the usual lambda calculus definition's effect for the Y combinator, which emulates recursion by duplication, while true recursion refers to the same (hence, shared) entity.
In
x = f x
(_Y g) = g (_Y g)
both xs refer to the same entity, but each of (_Y g)s refer to equivalent, but separate, entity. That's the intention of it, anyway.
Of course thanks to referential transparency there's no guarantee in Haskell the language for any of this. But GHC the compiler does behave this way.
_Y g is a common sub-expression and it could be "eliminated" by a compiler by giving it a name and reusing that named entity, subverting the whole purpose of it. That's why the GHC has the "no common sub-expressions elimination" -fno-cse flag which prevents this explicitly. It used to be that you had to use this flag to achieve the desired behaviour here, but not anymore. GHC won't be as aggressive at common sub-expressions elimination anymore, with the more recent (read: several years now) versions.
disclaimer: I'm the author of that part of the page you're referring to. Was hoping for the back-and-forth that's usual on wiki pages, but it never came, so my work didn't get reviewed like that. Either no-one bothered, or it is passable (lacking major errors). The wiki seems to be largely abandoned for many years now.
1 The g function involved,
(3:) . minus [5,7..] . foldr (\ (x:xs) ⟶ (x:) . union xs) []
. map (\ p ⟶ [p², p² + 2p..])
produces an increasing stream of all odd primes given an increasing stream of all odd primes. To produce a prime N in value, it consumes its input stream up to the first prime above sqrt(N) in value, at least. Thus the production points are given roughly by repeated squaring, and there are ~ log (log N) of such g functions in total in the chain (or "tower") of these primes producers, each immediately garbage collectible, the lowest one producing its primes given just the first odd prime, 3, known a priori.
And with the two-staged _Y2 g = g x where { x = g x } there would be only two of them in the chain, but only the top one would be immediately garbage collectible, as discussed at the referenced link above.
_Y is translated to the following STG:
_Y f = let x = _Y f in f x
fix is translated identically to the Haskell source:
fix f = let x = f x in x
So fix f sets up a recursive thunk x and returns it, while _Y is a recursive function, and importantly it’s not tail-recursive. Forcing _Y f enters f, passing a new call to _Y f as an argument, so each recursive call sets up a new thunk; forcing the x returned by fix f enters f, passing x itself as an argument, so each recursive call is into the same thunk—this is what’s meant by “sharing”.
The sharing version usually has better memory usage, and also lets the GHC RTS detect some kinds of infinite loop. When a thunk is forced, before evaluation starts, it’s replaced with a “black hole”; if at any point during evaluation of a thunk a black hole is reached from the same thread, then we know we have an infinite loop and can throw an exception (which you may have seen displayed as Exception: <<loop>>).
I think you already received excellent answers, from a GHC/Haskell perspective. I just wanted to chime in and add a few historical/theoretical notes.
The correspondence between unfolding and cyclic views of recursion is rigorously studied in Hasegawa's PhD thesis: https://www.springer.com/us/book/9781447112211
(Here's a shorter paper that you can read without paying Springer: https://link.springer.com/content/pdf/10.1007%2F3-540-62688-3_37.pdf)
Hasegawa assumes a traced monoidal category, a requirement that is much less stringent than the usual PCPO assumption of domain theory, which forms the basis of how we think about Haskell in general. What Hasegawa showed was that one can define these "sharing" fixed point operators in such a setting, and established that they correspond to the usual unfolding view of fixed points from Church's lambda-calculus. That is, there is no way to tell them apart by making them produce different answers.
Hasegawa's correspondence holds for what's known as central arrows; i.e., when there are no "effects" involved. Later on, Benton and Hyland extended this work and showed that the correspondence holds for certain cases when the underlying arrow can perform "mild" monadic effects as well: https://pdfs.semanticscholar.org/7b5c/8ed42a65dbd37355088df9dde122efc9653d.pdf
Unfortunately, Benton and Hyland only allow effects that are quite "mild": Effects like the state and environment monads fit the bill, but not general effects like exceptions, lists, or IO. (The fixed point operators for these effectful computations are known as mfix in Haskell, with the type signature (a -> m a) -> m a, and they form the basis of the recursive-do notation.)
It's still an open question how to extend this work to cover arbitrary monadic effects. Though it doesn't seem to be receiving much attention these days. (Would make a great PhD topic for those interested in the correspondence between lambda-calculus, monadic effects, and graph-based computations.)

Interesting operators in Haskell that obey modal axioms

I was just looking at the type of map :: (a -> b) -> [a] -> [b] and just the shape of this function made me wonder whether we could see the list forming operator [ ] as obeying various axioms common to normal modal logics (e.g, T, S4, S5, B), since we seem to have at least the K-axiom of normal modal logics, with [(a -> b)] -> [a] -> [b].
This leads to my question: are there familiar, interesting operators or functors in Haskell which have the syntax of modal operators of a certain kind, and which obey the axioms common to normal modal logics (i.e, K, T, S4, S5 and B)?
This question can be sharpened and made more specific. Consider an operator L, and its dual M. Now the question becomes: are there any familiar, interesting operators in Haskell with some of the following properties:
(1) L(a -> b) -> La -> Lb
(2) La -> a
(3) Ma -> L(M a)
(4) La -> L(L a)
(5) a -> L(M a)
It would be very interesting to see some nice examples.
I've thought of a potential example, but it would be good to know whether I am correct: the double negation translation with L as not not and M as not. This translation takes every formula a to its double negation translation (a -> ⊥) -> ⊥ and, crucially, validates axioms (1)-(4), but not axiom (5). I asked a question here https://math.stackexchange.com/questions/2347437/continuations-in-mathematics-nice-examples and it seems the double negation translation can be simulated via the continuation monad, the endofunctor taking every formula a to its double negation translation (a -> ⊥) -> ⊥. There Derek Elkins notes the existence of a couple of double negation translations corresponding, via the Curry-Howard isomorphism, to different continuation-passing style transforms, e.g. Kolmogorov's corresponds to the call-by-name CPS transform.
Perhaps there are other operations that can be done in the continuation monad via Haskell which can validate axioms (1)-(5).
(And just to eliminate one example: there are clear relations between so-called Lax logic https://www.sciencedirect.com/science/article/pii/S0890540197926274 and Monads in Haskell, with the return operation obeying the laws of the modal operator of this logic (which is an endofunctor). I am not interested so much in those examples, but in examples of Haskell operators which obey some of the axioms of modal operators in classical normal modal logics)
Preliminary note: I apologise for spending a good chunk of this answer talking about Propositional Lax Logic, a topic you are very familiar with and not too interested in as far as this question is concerned. In any case, I do feel this theme is worthy of broader exposure -- and thanks for making me aware of it!
The modal operator in propositional lax logic (PLL) is the Curry-Howard counterpart of Monad type constructors. Note the correspondence between its axioms...
DT: x -> D x
D4: D (D x) -> D x
DF: (x -> y) -> D x -> D y
... and the types of return, join and fmap, respectively.
There are a number of papers by Valeria de Paiva discussing intuitionistic modal logics and, in particular, PLL. The remarks about PLL here are largely based on Alechina et. al., Categorical and Kripke Semantics for Constructive S4 Modal Logic (2001). Interestingly, that paper makes a case for the PLL being less weird than it might seem at first (cf. Fairtlough and Mendler, Propositional Lax Logic (1997): "As a modal logic it is special because it features a single modal operator [...] that has a flavour both of possibility and necessity"). Starting with CS4, a version of intuitionistic S4 without distribution of possibility over disjunction...
B stands for box, and D for diamond
BK: B (x -> y) -> (B x -> B y)
BT: B x -> x
B4: B x -> B (B x)
DK: B (x -> y) -> (D x -> D y)
DT: x -> D x
D4: B (B x) -> B x
... and adding x -> B x to it causes B to become trivial (or, in Haskell parlance, Identity), simplifying the logic to PLL. That being so, PLL can be regarded as a special case of a variant of intuitionistic S4. Furthermore, it frames PLL's D as a possibility-like operator. That is intuitively appealing if we take D as the counterpart to Haskell Monads, which often do have a possibility flavour (consider Maybe Integer -- "There might be an Integer here" -- or IO Integer -- "I will get an Integer when the program is executed").
A few other possibilities:
At a glance, it seems the symmetrical move of making D trivial leads us to something very much like ComonadApply. I say "very much like" largely due to the functorial strength of Haskell Functors, the matter being that x /\ B y -> B (x /\ y) is an awkward thing to have if you are looking for a conventional necessity modality.
Kenneth Foner's Functional Pearl: Getting a Quick Fix on Comonads (thanks to dfeuer for the reference) works towards expressing intuitionistic K4 in Haskell, covering some of the difficulties along the way (including the functorial strength issue mentioned above).
Matt Parsons' Distributed Modal Logic offers a Haskell-oriented presentation of intuitionistc S5 and an interpretation of it, originally by Tom Murphy VII, in terms of distributed computing: B x as a x-producing computation that can be run anywhere on a network, and D x as an address to a x somewhere on it.
Temporal logics can be related via Curry-Howard to functional reactive programming (FRP). Suggestions of jumping-off points include de Paiva and Eades III, Constructive Temporal Logic, Categorically (2017), as well as this blog post by Philip Schuster alongside this interesting /r/haskell thread about it.

How to get simpler but equivalent version of a Haskell expression

Although I have been learning Haskell for some time, there is one common problem I run into constantly. Let's take this expression as an example:
e f $ g . h i . j
One may wonder, given $ and . from Prelude, what are type constraints on e or h for expression to be valid?
Is it possible to get a 'simpler' but equivalent representation? For me, 'simpler' would be one that uses parentheses everywhere and eliminates need to define operator precedence rules.
If not, which Haskell report sections do I need to read to have complete picture?
This might be relevant for many novice Haskell programmers. I know many programmers that add parentheses so that they do not need to memorize (or understand) precedence tables like this one: http://docs.oracle.com/javase/tutorial/java/nutsandbolts/operators.html
Is it possible to get a 'simpler' but equivalent representation? Of course, this is called parsing, and is done by compilers, interpreters, etc.
90% of the time, all you need to remember is how $ , . and function application f x work together. This is because $ and function application are really simple - they bind the loosest and tightest respectively - they are like addition and the exponent in bodmas.
From your example
e f $ g . h i . j
the function applications bind first, so we have
(e f) $ g . (h i) . j
Function application is left associative so
f g h ==> ((f g) h)
You may have to google currying to understand why the above can be used like foo(a, b) in other languages.
In the next step do everything in the middle - I just use brackets or a table to remember this bit, it's usually straightforward. For example there are several operators like >> and >>= that are used at the same time when you are working with monads. I just add brackets when ghc complains.
So no we have
(e f) $ (g . ((h i) . j))
The order of the brackets doesn't matter as function composition is associative, however Haskell makes it right associative.
So then we have
((e f) (g . ((h i) . j)))
The above (simple) example demonstrates why those operators exist in the first place.

Would a type class "between" Category and Arrow make sense?

Often you have something like an Applicative without pure, or something like a Monad, but without return. The semigroupoid package covers these cases with Apply and Bind. Now I'm in a similar situation concerning Arrow, where I can't define a meaningful arr function, but I think the other functions would make perfect sense.
I defined a type that holds a function and it's reverse function:
import Control.Category
data Rev a b = Rev (a -> b) (b -> a)
reverse (Rev f g) = Rev g f
apply (Rev f _) x = f x
applyReverse (Rev _ g) y = g y
compose (Rev f f') (Rev g g') = Rev ((Prelude..) f g) ((Prelude..) g' f')
instance Category Rev where
id = Rev Prelude.id Prelude.id
(.) x y = compose x y
Now I can't implement Arrow, but something weaker:
--"Ow" is an "Arrow" without "arr"
class Category a => Ow a where
first :: a b c -> a (b,d) (c,d)
first f = stars f Control.Category.id
second :: a b c -> a (d,b) (d,c)
second f = stars Control.Category.id f
--same as (***)
stars :: a b c -> a b' c' -> a (b,b') (c,c')
...
import Control.Arrow
instance Ow Rev where
stars (Rev f f') (Rev g g') = Rev (f *** g) (f' *** g')
I think I can't implement the equivalent of &&&, as it is defined as f &&& g = arr (\b -> (b,b)) >>> f *** g, and (\b -> (b,b)) isn't reversable. Still, do you think this weaker type class could be useful? Does it even make sense from a theoretical point of view?
This approach was explored in "There and Back Again: Arrows for invertible programming": http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.9383
For precisely the reasons you're running into, this turned out to be a bad approach that wasn't picked up more widely. More recently, Tillmann Rendel produced a delightful approach to invertible syntax that substituted partial isomorphisms for biarrows ( http://www.informatik.uni-marburg.de/~rendel/rendel10invertible.pdf ) . This has been packaged up on hackage for folks to use and play with: http://hackage.haskell.org/package/invertible-syntax
That said, I think an arrow without arr makes a certain amount of sense. I just don't think that such a thing is an appropriate vehicle to capture invertible functions.
Edit: There's also Adam Megacz's Generalized Arrows (http://www.cs.berkeley.edu/~megacz/garrows/). These are maybe not useful for invertible programming either (though the basic typeclass does seem to be inveritble), but they do have uses in other situations where arr is too strong, but other arrow operations may make sense.
From a Category Theory standpoint, the Category type class describes any category whose arrows can be describedn straightforwardly in Haskell by a type constructor. Almost any additional feature you want to build on top of this, in the form of new primitive arrows or arrow-building functions, will make sense to some extent if you can implement it using total functions. The only caveat is that adding expressive power can break other requirements, as often happens with arr.
Your specific example of invertible functions describes a category where all arrows are isomorphisms. In a shocking twist of the completely and utterly expected, Edward Kmett already has an implementation of this on Hackage.
The arr function roughly amounts to a functor (in the category theory sense) from Haskell functions to the Arrow instance, leaving objects the same (i.e., the type parameters). Simply removing arr from Arrow gives you... something else, which is probably not very useful on its own, without at least adding the equivalents of arr fst and arr snd as primitives.
I believe that adding primitives for fst and snd, along with (&&&) to build a new arrow from two inputs, should give you a category with products, which is absolutely sensible from a theoretical point of view, as well as not being compatible with the invertible arrows you're using for the reasons you've found.

Resources