Rewriting as a practical optimization technique in GHC: Is it really needed? - haskell

I was reading the paper authored by Simon Peyton Jones, et al. named “Playing by the Rules: Rewriting as a practical optimization technique in GHC”. In the second section, namely “The basic idea” they write:
Consider the familiar map function, that applies a function to each element of a list. Written in Haskell, map looks like this:
map f [] = []
map f (x:xs) = f x : map f xs
Now suppose that the compiler encounters the following call of map:
map f (map g xs)
We know that this expression is equivalent to
map (f . g) xs
(where “.” is function composition), and we know that the latter expression is more efficient than the former because there is no intermediate list. But the compiler has no such knowledge.
One possible rejoinder is that the compiler should be smarter --- but the programmer will always know things that the compiler cannot figure out. Another suggestion is this: allow the programmer to communicate such knowledge directly to the compiler. That is the direction we explore here.
My question is, why can't we make the compiler smarter? The authors say that “but the programmer will always know things that the compiler cannot figure out”. However, that's not a valid answer because the compiler can indeed figure out that map f (map g xs) is equivalent to map (f . g) xs, and here is how:
map f (map g xs)
map g xs unifies with map f [] = [].
Hence map g [] = [].
map f (map g []) = map f [].
map f [] unifies with map f [] = [].
Hence map f (map g []) = [].
map g xs unifies with map f (x:xs) = f x : map f xs.
Hence map g (x:xs) = g x : map g xs.
map f (map g (x:xs)) = map f (g x : map g xs).
map f (g x : map g xs) unifies with map f (x:xs) = f x : map f xs.
Hence map f (map g (x:xs)) = f (g x) : map f (map g xs).
Hence we now have the rules:
map f (map g []) = []
map f (map g (x:xs)) = f (g x) : map f (map g xs)
As you can see f (g x) is just (f . g) and map f (map g xs) is being called recursively. This is exactly the definition of map (f . g) xs. The algorithm for this automatic conversion seems to be pretty simple. So why not implement this instead of rewriting rules?

Aggressive inlining can derive many of the equalities that rewrite rules are short-hand for.
The differences is that inlining is "blind", so you don't know in advance if the result will be better or worse, or even if it will terminate.
Rewrite rules, however, can do completely non-obvious things, based on much higher level facts about the program. Think of rewrite rules as adding new axioms to the optimizer. By adding these you have a richer rule set to apply, making complicated optimizations easier to apply.
Stream fusion, for example, changes the data type representation. This cannot be expressed through inlining, as it involves a representation type change (we reframe the optimization problem in terms of the Stream ADT). Easy to state in rewrite rules, impossible with inlining alone.

Something in that direction was investigated in a Bachelor’s thesis of Johannes Bader, a student of mine: Finding Equations in Functional Programs (PDF file).
To some degree it is certainly possible, but
it is quite tricky. Finding such equations is in a sense as hard as finding proofs in a theorem proofer, and
it is not often very useful, because it tends to find equations that the programmer would rarely write directly.
It is however useful to clean up after other transformations such as inlining and various form of fusion.

This could be viewed as a balance between balancing expectations in the specific case, and balancing them in the general case. This balance can generate funny situations where you can know how to make something faster, but it is better for the language in general if you don't.
In the specific case of maps in the structure you give, the computer could find optimizations. However, what about related structures? What if the function isn't map? What if there's an additional layer of indirection, such as a function that returns map. In those cases, the compiler cannot optimize easily. This is the general case problem.
How if you do optimize the special case, one of two outcomes occurs
Nobody relies on it, because they aren't sure if it is there or not. In this case, articles like the one you quote get written
People do start relying on it, and now every developer is forced to remember "maps done in this configuration get automatically converted to the fast version for me, but if I do it in this configuration I don't.' This starts to manipulate the way people use the language, and can actually reduce readability!
Given the need for developers to think about such optimizations in the general case, we expect to see developers doing these optimizations in the simple case, decreasing the need to for the optimization in the first place!
Now, if it turns out that the particular case you are interested accounts for something massive like 2% of the world codebase in Haskell, there would be a much stronger argument for applying your special-case optimization.

Related

Sharing vs. non-sharing fixed-point combinator

This is the usual definition of the fixed-point combinator in Haskell:
fix :: (a -> a) -> a
fix f = let x = f x in x
On https://wiki.haskell.org/Prime_numbers, they define a different fixed-point combinator:
_Y :: (t -> t) -> t
_Y g = g (_Y g) -- multistage, non-sharing, g (g (g (g ...)))
-- g (let x = g x in x) -- two g stages, sharing
_Y is a non-sharing fixpoint combinator, here arranging for a recursive "telescoping" multistage primes production (a tower of producers).
What exactly does this mean? What is "sharing" vs. "non-sharing" in that context? How does _Y differ from fix?
"Sharing" means f x re-uses the x that it creates; but with _Y g = g . g . g . g . ..., each g calculates its output anew (cf. this and this).
In that context, the sharing version has much worse memory usage, leads to a space leak.1
The definition of _Y mirrors the usual lambda calculus definition's effect for the Y combinator, which emulates recursion by duplication, while true recursion refers to the same (hence, shared) entity.
In
x = f x
(_Y g) = g (_Y g)
both xs refer to the same entity, but each of (_Y g)s refer to equivalent, but separate, entity. That's the intention of it, anyway.
Of course thanks to referential transparency there's no guarantee in Haskell the language for any of this. But GHC the compiler does behave this way.
_Y g is a common sub-expression and it could be "eliminated" by a compiler by giving it a name and reusing that named entity, subverting the whole purpose of it. That's why the GHC has the "no common sub-expressions elimination" -fno-cse flag which prevents this explicitly. It used to be that you had to use this flag to achieve the desired behaviour here, but not anymore. GHC won't be as aggressive at common sub-expressions elimination anymore, with the more recent (read: several years now) versions.
disclaimer: I'm the author of that part of the page you're referring to. Was hoping for the back-and-forth that's usual on wiki pages, but it never came, so my work didn't get reviewed like that. Either no-one bothered, or it is passable (lacking major errors). The wiki seems to be largely abandoned for many years now.
1 The g function involved,
(3:) . minus [5,7..] . foldr (\ (x:xs) ⟶ (x:) . union xs) []
. map (\ p ⟶ [p², p² + 2p..])
produces an increasing stream of all odd primes given an increasing stream of all odd primes. To produce a prime N in value, it consumes its input stream up to the first prime above sqrt(N) in value, at least. Thus the production points are given roughly by repeated squaring, and there are ~ log (log N) of such g functions in total in the chain (or "tower") of these primes producers, each immediately garbage collectible, the lowest one producing its primes given just the first odd prime, 3, known a priori.
And with the two-staged _Y2 g = g x where { x = g x } there would be only two of them in the chain, but only the top one would be immediately garbage collectible, as discussed at the referenced link above.
_Y is translated to the following STG:
_Y f = let x = _Y f in f x
fix is translated identically to the Haskell source:
fix f = let x = f x in x
So fix f sets up a recursive thunk x and returns it, while _Y is a recursive function, and importantly it’s not tail-recursive. Forcing _Y f enters f, passing a new call to _Y f as an argument, so each recursive call sets up a new thunk; forcing the x returned by fix f enters f, passing x itself as an argument, so each recursive call is into the same thunk—this is what’s meant by “sharing”.
The sharing version usually has better memory usage, and also lets the GHC RTS detect some kinds of infinite loop. When a thunk is forced, before evaluation starts, it’s replaced with a “black hole”; if at any point during evaluation of a thunk a black hole is reached from the same thread, then we know we have an infinite loop and can throw an exception (which you may have seen displayed as Exception: <<loop>>).
I think you already received excellent answers, from a GHC/Haskell perspective. I just wanted to chime in and add a few historical/theoretical notes.
The correspondence between unfolding and cyclic views of recursion is rigorously studied in Hasegawa's PhD thesis: https://www.springer.com/us/book/9781447112211
(Here's a shorter paper that you can read without paying Springer: https://link.springer.com/content/pdf/10.1007%2F3-540-62688-3_37.pdf)
Hasegawa assumes a traced monoidal category, a requirement that is much less stringent than the usual PCPO assumption of domain theory, which forms the basis of how we think about Haskell in general. What Hasegawa showed was that one can define these "sharing" fixed point operators in such a setting, and established that they correspond to the usual unfolding view of fixed points from Church's lambda-calculus. That is, there is no way to tell them apart by making them produce different answers.
Hasegawa's correspondence holds for what's known as central arrows; i.e., when there are no "effects" involved. Later on, Benton and Hyland extended this work and showed that the correspondence holds for certain cases when the underlying arrow can perform "mild" monadic effects as well: https://pdfs.semanticscholar.org/7b5c/8ed42a65dbd37355088df9dde122efc9653d.pdf
Unfortunately, Benton and Hyland only allow effects that are quite "mild": Effects like the state and environment monads fit the bill, but not general effects like exceptions, lists, or IO. (The fixed point operators for these effectful computations are known as mfix in Haskell, with the type signature (a -> m a) -> m a, and they form the basis of the recursive-do notation.)
It's still an open question how to extend this work to cover arbitrary monadic effects. Though it doesn't seem to be receiving much attention these days. (Would make a great PhD topic for those interested in the correspondence between lambda-calculus, monadic effects, and graph-based computations.)

Hlint suggestion: use uncurry

I have this line of code:
map (\(u,v) -> flatTorus n u v) gridUV
Hlint suggests me to replace it with
map (uncurry (flatTorus n)) gridUV
What is the motivation of this suggestion ? Is it for shortness only, or something else (performance) ? Because though it is longer, I find the first code more easy to read.
In fact my question is more general,, because this is just an example among others: does Hlint suggestions are generally based on a shortness motivation only or are there other improvements behind the suggestions ?
I think Hlint prefers using uncurry because it gives you an invariant representation of the callback. Lambda expressions are inherently sensitive to representation, since
\(u, v) -> flatTorus n u v
is equivalent to
\(x, y) -> flatTorus n x y
even though they are textually different.
Using uncurry frees readers of the cognitive load of doing alpha equivalence in their head (e.g., recognizing that the above two expressions are the same), but then saddles them with the cognitive load of having to remember a vocabulary of combinators. Ultimately, it's a matter of taste.
These are not actually quite equivalent.
(\(x, y) -> (,) x y) undefined = undefined
uncurry (,) undefined = (undefined, undefined)
Do you should take any suggestion to use uncurry with a grain of salt. Think about whether that extra laziness will help, hurt, or make no difference.

Rules firing on class methods

My apologies if this is a somewhat vague question, I just want to know I'm heading in the right direction. (GHC 8.0.2)
I've got a "list like" data type, lets call it T a. For example it's an instance of Foldable, Functor and Monoid.
There's a few rules I'd like to put into place, for example (roughly speaking, not using exact syntax here):
You can fold two lists appended just by folding the first then folding the second (there's no need to actually append the lists):
foldl' f z (x ++ y) -> foldl' f (foldl' f z x) y
If you're folding a list that's been mapped, you can just drag the map function into the fold, eliminating the map:
foldl' f z (fmap x) -> let h x y = f x (g y) in foldl h z x
If you want to map appended lists, just map the individual ones then append them:
fmap (x ++ y) -> fmap x ++ fmap y
How should I be doing this? What I've currently done is have my instance methods call a top level method, like fmap', which does all the work, and then I've applied the rules to these methods. However -ddump-rule-rewrites doesn't seem to show my rules firing, but there are a lot of Class op * rules firing, like Class op fmap and Class op foldl'
Should I be:
a. Doing what I'm doing?
b. Apply my rules on the class methods directly?
c. The standard class rules should cover the situations for my data type, there's no need to do anything? OR
d. Some combination of the above, involving a mixture of INLINE and NOINLINE pragmas at the appropriate state (please detail).
I suspect the answer is (d), I'd just like some guidance to get started.

How do you create a rewrite pass based on whether two expressions refers to the same bound name?

How do you find and rewrite expressions that refer to the same bound name? For example, in the expression
let xs = ...
in ...map f xs...map g xs...
both the expression map f xs and the expression map g xs refer to the same bound name, namely xs. Are there any standard compiler analyses that would let us identify this situation and rewrite the two map expressions to e.g.
let xs = ...
e = unzip (map (f *** g) xs)
in ...fst e...snd e...
I've been thinking about the problem in terms of a tree traversal. For example given the AST:
data Ast = Map (a -> b) -> Ast -> Ast
| Var String
| ...
we could try to write a tree traversal to detect this case, but that seems difficult since two Map nodes that refer to the same Var might appear at widely different places in the tree. This analysis seems easier to do if you inverted all the references in the AST, making it a graph, but I wanted to see if there are any alternatives to that approach.
I think what you are looking for is a set of program transformations usually referred to as Tupling, Fusion, and Supercompilation, which fall under the more general theory of Unfold/Fold transformation. You can achieve what you want as follows.
First perform speculative evaluations (Unfolding) by "driving" the definition of map over the arguments, which gives rise to two new pseudo programs, depending on whether xs is of the form y:ys or []. In pseudo code:
let y:ys = ...
in ...(f y):(map f ys)...(g y):(map g ys)...
let [] = ...
in ...[]...[]...
Then perform abstractions for shared structure (Tupling) and generalisations (Folding) with respect to the original program to stop otherwise perpetual unfolding:
let xs = ...
in ...(fst tuple)...(snd tuple)...
where tuple = generalisation xs
generalisation [] = ([],[])
generalisation (y:ys) = let tuple = generalisation ys
in ((f y):(fst tuple),(g y):(snd tuple))
I hope this gives you an idea, but program tranformation is a research field in its own right, and it is hard to explain well without drawing acyclic directed graphs.

Haskell composition (.) vs F#'s pipe forward operator (|>)

In F#, use of the the pipe-forward operator, |>, is pretty common. However, in Haskell I've only ever seen function composition, (.), being used. I understand that they are related, but is there a language reason that pipe-forward isn't used in Haskell or is it something else?
In F# (|>) is important because of the left-to-right typechecking. For example:
List.map (fun x -> x.Value) xs
generally won't typecheck, because even if the type of xs is known, the type of the argument x to the lambda isn't known at the time the typechecker sees it, so it doesn't know how to resolve x.Value.
In contrast
xs |> List.map (fun x -> x.Value)
will work fine, because the type of xs will lead to the type of x being known.
The left-to-right typechecking is required because of the name resolution involved in constructs like x.Value. Simon Peyton Jones has written a proposal for adding a similar kind of name resolution to Haskell, but he suggests using local constraints to track whether a type supports a particular operation or not, instead. So in the first sample the requirement that x needs a Value property would be carried forward until xs was seen and this requirement could be resolved. This does complicate the type system, though.
I am being a little speculative...
Culture: I think |> is an important operator in the F# "culture", and perhaps similarly with . for Haskell. F# has a function composition operator << but I think the F# community tends to use points-free style less than the Haskell community.
Language differences: I don't know enough about both languages to compare, but perhaps the rules for generalizing let-bindings are sufficiently different as to affect this. For example, I know in F# sometimes writing
let f = exp
will not compile, and you need explicit eta-conversion:
let f x = (exp) x // or x |> exp
to make it compile. This also steers people away from points-free/compositional style, and towards the pipelining style. Also, F# type inference sometimes demands pipelining, so that a known type appears on the left (see here).
(Personally, I find points-free style unreadable, but I suppose every new/different thing seems unreadable until you become accustomed to it.)
I think both are potentially viable in either language, and history/culture/accident may define why each community settled at a different "attractor".
More speculation, this time from the predominantly Haskell side...
($) is the flip of (|>), and its use is quite common when you can't write point-free code. So the main reason that (|>) not used in Haskell is that its place is already taken by ($).
Also, speaking from a bit of F# experience, I think (|>) is so popular in F# code because it resembles the Subject.Verb(Object) structure of OO. Since F# is aiming for a smooth functional/OO integration, Subject |> Verb Object is a pretty smooth transition for new functional programmers.
Personally, I like thinking left-to-right too, so I use (|>) in Haskell, but I don't think many other people do.
I think we're confusing things. Haskell's (.) is equivalent to F#'s (>>). Not to be confused with F#'s (|>) which is just inverted function application and is like Haskell's ($) - reversed:
let (>>) f g x = g (f x)
let (|>) x f = f x
I believe Haskell programmers do use $ often. Perhaps not as often as F# programmers tend to use |>. On the other hand, some F# guys use >> to a ridiculous degree: http://blogs.msdn.com/b/ashleyf/archive/2011/04/21/programming-is-pointless.aspx
If you want to use F#'s |> in Haskell then in Data.Function is the & operator (since base 4.8.0.0).
I have seen >>> being used for flip (.), and I often use that myself, especially for long chains that are best understood left-to-right.
>>> is actually from Control.Arrow, and works on more than just functions.
Left-to-right composition in Haskell
Some people use left-to-right (message-passing) style in Haskell too. See, for example, mps library on Hackage. An example:
euler_1 = ( [3,6..999] ++ [5,10..999] ).unique.sum
I think this style looks nice in some situations, but it's harder to read (one needs to know the library and all its operators, the redefined (.) is disturbing too).
There are also left-to-right as well as right-to-left composition operators in Control.Category, part of the base package. Compare >>> and <<< respectively:
ghci> :m + Control.Category
ghci> let f = (+2) ; g = (*3) in map ($1) [f >>> g, f <<< g]
[9,5]
There is a good reason to prefer left-to-right composition sometimes: evaluation order follows reading order.
I think
F#'s pipe forward operator (|>) should vs (&) in haskell.
// pipe operator example in haskell
factorial :: (Eq a, Num a) => a -> a
factorial x =
case x of
1 -> 1
_ -> x * factorial (x-1)
// terminal
ghic >> 5 & factorial & show
If you dont like (&) operator, you can custom it like F# or Elixir :
(|>) :: a -> (a -> b) -> b
(|>) x f = f x
infixl 1 |>
ghci>> 5 |> factorial |> show
Why infixl 1 |>? See the doc in Data-Function (&)
infixl = infix + left associativity
infixr = infix + right associativity
(.)
(.) means function composition. It means (f.g)(x) = f(g(x)) in Math.
foo = negate . (*3)
// ouput -3
ghci>> foo 1
// ouput -15
ghci>> foo 5
it equals
// (1)
foo x = negate (x * 3)
or
// (2)
foo x = negate $ x * 3
($) operator is also defind in Data-Function ($).
(.) is used for create Hight Order Function or closure in js. See example:
// (1) use lamda expression to create a Hight Order Function
ghci> map (\x -> negate (abs x)) [5,-3,-6,7,-3,2,-19,24]
[-5,-3,-6,-7,-3,-2,-19,-24]
// (2) use . operator to create a Hight Order Function
ghci> map (negate . abs) [5,-3,-6,7,-3,2,-19,24]
[-5,-3,-6,-7,-3,-2,-19,-24]
Wow, Less (code) is better.
Compare |> and .
ghci> 5 |> factorial |> show
// equals
ghci> (show . factorial) 5
// equals
ghci> show . factorial $ 5
It is the different between left —> right and right —> left. ⊙﹏⊙|||
Humanization
|> and & is better than .
because
ghci> sum (replicate 5 (max 6.7 8.9))
// equals
ghci> 8.9 & max 6.7 & replicate 5 & sum
// equals
ghci> 8.9 |> max 6.7 |> replicate 5 |> sum
// equals
ghci> (sum . replicate 5 . max 6.7) 8.9
// equals
ghci> sum . replicate 5 . max 6.7 $ 8.9
How to functional programming in object-oriented language?
please visit http://reactivex.io/
It support :
Java: RxJava
JavaScript: RxJS
C#: Rx.NET
C#(Unity): UniRx
Scala: RxScala
Clojure: RxClojure
C++: RxCpp
Lua: RxLua
Ruby: Rx.rb
Python: RxPY
Go: RxGo
Groovy: RxGroovy
JRuby: RxJRuby
Kotlin: RxKotlin
Swift: RxSwift
PHP: RxPHP
Elixir: reaxive
Dart: RxDart
Aside from style and culture, this boils down to optimizing the language design for either pure or impure code.
The |> operator is common in F# largely because it helps to hide two limitations that appear with predominantly-impure code:
Left-to-right type inference without structural subtypes.
The value restriction.
Note that the former limitation does not exist in OCaml because subtyping is structural instead of nominal, so the structural type is easily refined via unification as type inference progresses.
Haskell takes a different trade-off, choosing to focus on predominantly-pure code where these limitations can be lifted.
This is my first day to try Haskell (after Rust and F#), and I was able to define F#'s |> operator:
(|>) :: a -> (a -> b) -> b
(|>) x f = f x
infixl 0 |>
and it seems to work:
factorial x =
case x of
1 -> 1
_ -> x * factorial (x-1)
main =
5 |> factorial |> print
I bet a Haskell expert can give you an even better solution.

Resources