Hlint suggestion: use uncurry - haskell

I have this line of code:
map (\(u,v) -> flatTorus n u v) gridUV
Hlint suggests me to replace it with
map (uncurry (flatTorus n)) gridUV
What is the motivation of this suggestion ? Is it for shortness only, or something else (performance) ? Because though it is longer, I find the first code more easy to read.
In fact my question is more general,, because this is just an example among others: does Hlint suggestions are generally based on a shortness motivation only or are there other improvements behind the suggestions ?

I think Hlint prefers using uncurry because it gives you an invariant representation of the callback. Lambda expressions are inherently sensitive to representation, since
\(u, v) -> flatTorus n u v
is equivalent to
\(x, y) -> flatTorus n x y
even though they are textually different.
Using uncurry frees readers of the cognitive load of doing alpha equivalence in their head (e.g., recognizing that the above two expressions are the same), but then saddles them with the cognitive load of having to remember a vocabulary of combinators. Ultimately, it's a matter of taste.

These are not actually quite equivalent.
(\(x, y) -> (,) x y) undefined = undefined
uncurry (,) undefined = (undefined, undefined)
Do you should take any suggestion to use uncurry with a grain of salt. Think about whether that extra laziness will help, hurt, or make no difference.

Related

Sharing vs. non-sharing fixed-point combinator

This is the usual definition of the fixed-point combinator in Haskell:
fix :: (a -> a) -> a
fix f = let x = f x in x
On https://wiki.haskell.org/Prime_numbers, they define a different fixed-point combinator:
_Y :: (t -> t) -> t
_Y g = g (_Y g) -- multistage, non-sharing, g (g (g (g ...)))
-- g (let x = g x in x) -- two g stages, sharing
_Y is a non-sharing fixpoint combinator, here arranging for a recursive "telescoping" multistage primes production (a tower of producers).
What exactly does this mean? What is "sharing" vs. "non-sharing" in that context? How does _Y differ from fix?
"Sharing" means f x re-uses the x that it creates; but with _Y g = g . g . g . g . ..., each g calculates its output anew (cf. this and this).
In that context, the sharing version has much worse memory usage, leads to a space leak.1
The definition of _Y mirrors the usual lambda calculus definition's effect for the Y combinator, which emulates recursion by duplication, while true recursion refers to the same (hence, shared) entity.
In
x = f x
(_Y g) = g (_Y g)
both xs refer to the same entity, but each of (_Y g)s refer to equivalent, but separate, entity. That's the intention of it, anyway.
Of course thanks to referential transparency there's no guarantee in Haskell the language for any of this. But GHC the compiler does behave this way.
_Y g is a common sub-expression and it could be "eliminated" by a compiler by giving it a name and reusing that named entity, subverting the whole purpose of it. That's why the GHC has the "no common sub-expressions elimination" -fno-cse flag which prevents this explicitly. It used to be that you had to use this flag to achieve the desired behaviour here, but not anymore. GHC won't be as aggressive at common sub-expressions elimination anymore, with the more recent (read: several years now) versions.
disclaimer: I'm the author of that part of the page you're referring to. Was hoping for the back-and-forth that's usual on wiki pages, but it never came, so my work didn't get reviewed like that. Either no-one bothered, or it is passable (lacking major errors). The wiki seems to be largely abandoned for many years now.
1 The g function involved,
(3:) . minus [5,7..] . foldr (\ (x:xs) ⟶ (x:) . union xs) []
. map (\ p ⟶ [p², p² + 2p..])
produces an increasing stream of all odd primes given an increasing stream of all odd primes. To produce a prime N in value, it consumes its input stream up to the first prime above sqrt(N) in value, at least. Thus the production points are given roughly by repeated squaring, and there are ~ log (log N) of such g functions in total in the chain (or "tower") of these primes producers, each immediately garbage collectible, the lowest one producing its primes given just the first odd prime, 3, known a priori.
And with the two-staged _Y2 g = g x where { x = g x } there would be only two of them in the chain, but only the top one would be immediately garbage collectible, as discussed at the referenced link above.
_Y is translated to the following STG:
_Y f = let x = _Y f in f x
fix is translated identically to the Haskell source:
fix f = let x = f x in x
So fix f sets up a recursive thunk x and returns it, while _Y is a recursive function, and importantly it’s not tail-recursive. Forcing _Y f enters f, passing a new call to _Y f as an argument, so each recursive call sets up a new thunk; forcing the x returned by fix f enters f, passing x itself as an argument, so each recursive call is into the same thunk—this is what’s meant by “sharing”.
The sharing version usually has better memory usage, and also lets the GHC RTS detect some kinds of infinite loop. When a thunk is forced, before evaluation starts, it’s replaced with a “black hole”; if at any point during evaluation of a thunk a black hole is reached from the same thread, then we know we have an infinite loop and can throw an exception (which you may have seen displayed as Exception: <<loop>>).
I think you already received excellent answers, from a GHC/Haskell perspective. I just wanted to chime in and add a few historical/theoretical notes.
The correspondence between unfolding and cyclic views of recursion is rigorously studied in Hasegawa's PhD thesis: https://www.springer.com/us/book/9781447112211
(Here's a shorter paper that you can read without paying Springer: https://link.springer.com/content/pdf/10.1007%2F3-540-62688-3_37.pdf)
Hasegawa assumes a traced monoidal category, a requirement that is much less stringent than the usual PCPO assumption of domain theory, which forms the basis of how we think about Haskell in general. What Hasegawa showed was that one can define these "sharing" fixed point operators in such a setting, and established that they correspond to the usual unfolding view of fixed points from Church's lambda-calculus. That is, there is no way to tell them apart by making them produce different answers.
Hasegawa's correspondence holds for what's known as central arrows; i.e., when there are no "effects" involved. Later on, Benton and Hyland extended this work and showed that the correspondence holds for certain cases when the underlying arrow can perform "mild" monadic effects as well: https://pdfs.semanticscholar.org/7b5c/8ed42a65dbd37355088df9dde122efc9653d.pdf
Unfortunately, Benton and Hyland only allow effects that are quite "mild": Effects like the state and environment monads fit the bill, but not general effects like exceptions, lists, or IO. (The fixed point operators for these effectful computations are known as mfix in Haskell, with the type signature (a -> m a) -> m a, and they form the basis of the recursive-do notation.)
It's still an open question how to extend this work to cover arbitrary monadic effects. Though it doesn't seem to be receiving much attention these days. (Would make a great PhD topic for those interested in the correspondence between lambda-calculus, monadic effects, and graph-based computations.)

Rewriting as a practical optimization technique in GHC: Is it really needed?

I was reading the paper authored by Simon Peyton Jones, et al. named “Playing by the Rules: Rewriting as a practical optimization technique in GHC”. In the second section, namely “The basic idea” they write:
Consider the familiar map function, that applies a function to each element of a list. Written in Haskell, map looks like this:
map f [] = []
map f (x:xs) = f x : map f xs
Now suppose that the compiler encounters the following call of map:
map f (map g xs)
We know that this expression is equivalent to
map (f . g) xs
(where “.” is function composition), and we know that the latter expression is more efficient than the former because there is no intermediate list. But the compiler has no such knowledge.
One possible rejoinder is that the compiler should be smarter --- but the programmer will always know things that the compiler cannot figure out. Another suggestion is this: allow the programmer to communicate such knowledge directly to the compiler. That is the direction we explore here.
My question is, why can't we make the compiler smarter? The authors say that “but the programmer will always know things that the compiler cannot figure out”. However, that's not a valid answer because the compiler can indeed figure out that map f (map g xs) is equivalent to map (f . g) xs, and here is how:
map f (map g xs)
map g xs unifies with map f [] = [].
Hence map g [] = [].
map f (map g []) = map f [].
map f [] unifies with map f [] = [].
Hence map f (map g []) = [].
map g xs unifies with map f (x:xs) = f x : map f xs.
Hence map g (x:xs) = g x : map g xs.
map f (map g (x:xs)) = map f (g x : map g xs).
map f (g x : map g xs) unifies with map f (x:xs) = f x : map f xs.
Hence map f (map g (x:xs)) = f (g x) : map f (map g xs).
Hence we now have the rules:
map f (map g []) = []
map f (map g (x:xs)) = f (g x) : map f (map g xs)
As you can see f (g x) is just (f . g) and map f (map g xs) is being called recursively. This is exactly the definition of map (f . g) xs. The algorithm for this automatic conversion seems to be pretty simple. So why not implement this instead of rewriting rules?
Aggressive inlining can derive many of the equalities that rewrite rules are short-hand for.
The differences is that inlining is "blind", so you don't know in advance if the result will be better or worse, or even if it will terminate.
Rewrite rules, however, can do completely non-obvious things, based on much higher level facts about the program. Think of rewrite rules as adding new axioms to the optimizer. By adding these you have a richer rule set to apply, making complicated optimizations easier to apply.
Stream fusion, for example, changes the data type representation. This cannot be expressed through inlining, as it involves a representation type change (we reframe the optimization problem in terms of the Stream ADT). Easy to state in rewrite rules, impossible with inlining alone.
Something in that direction was investigated in a Bachelor’s thesis of Johannes Bader, a student of mine: Finding Equations in Functional Programs (PDF file).
To some degree it is certainly possible, but
it is quite tricky. Finding such equations is in a sense as hard as finding proofs in a theorem proofer, and
it is not often very useful, because it tends to find equations that the programmer would rarely write directly.
It is however useful to clean up after other transformations such as inlining and various form of fusion.
This could be viewed as a balance between balancing expectations in the specific case, and balancing them in the general case. This balance can generate funny situations where you can know how to make something faster, but it is better for the language in general if you don't.
In the specific case of maps in the structure you give, the computer could find optimizations. However, what about related structures? What if the function isn't map? What if there's an additional layer of indirection, such as a function that returns map. In those cases, the compiler cannot optimize easily. This is the general case problem.
How if you do optimize the special case, one of two outcomes occurs
Nobody relies on it, because they aren't sure if it is there or not. In this case, articles like the one you quote get written
People do start relying on it, and now every developer is forced to remember "maps done in this configuration get automatically converted to the fast version for me, but if I do it in this configuration I don't.' This starts to manipulate the way people use the language, and can actually reduce readability!
Given the need for developers to think about such optimizations in the general case, we expect to see developers doing these optimizations in the simple case, decreasing the need to for the optimization in the first place!
Now, if it turns out that the particular case you are interested accounts for something massive like 2% of the world codebase in Haskell, there would be a much stronger argument for applying your special-case optimization.

what is the meaning of "let x = x in x" and "data Float#" in GHC.Prim in Haskell

I looked at the module of GHC.Prim and found that it seems that all datas in GHC.Prim are defined as data Float# without something like =A|B, and all functions in GHC.Prim is defined as gtFloat# = let x = x in x.
My question is whether these definations make sense and what they mean.
I checked the header of GHC.Prim like below
{-
This is a generated file (generated by genprimopcode).
It is not code to actually be used. Its only purpose is to be
consumed by haddock.
-}
I guess it may have some relations with the questions and who could please explain that to me.
It's magic :)
These are the "primitive operators and operations". They are hardwired into the compiler, hence there are no data constructors for primitives and all functions are bottom since they are necessarily not expressable in pure haskell.
(Bottom represents a "hole" in a haskell program, an infinite loop or undefined are examples of bottom)
To put it another way
These data declarations/functions are to provide access to the raw compiler internals. GHC.Prim exists to export these primitives, it doesn't actually implement them or anything (eg its code isn't actually useful). All of that is done in the compiler.
It's meant for code that needs to be extremely optimized. If you think you might need it, some useful reading about the primitives in GHC
A brief expansion of jozefg's answer ...
Primops are precisely those operations that are supplied by the runtime because they can't be defined within the language (or shouldn't be, for reasons of efficiency, say). The true purpose of GHC.Prim is not to define anything, but merely to export some operations so that Haddock can document their existence.
The construction let x = x in x is used at this point in GHC's codebase because the value undefined has not yet been, um, "defined". (That waits until the Prelude.) But notice that the circular let construction, just like undefined, is both syntactically correct and can have any type. That is, it's an infinite loop with the semantics of ⊥, just as undefined is.
... and an aside
Also note that in general the Haskell expression let x = z in y means "change the variable x to the expression z wherever x occurs in the expression y". If you're familiar with the lambda calculus, you should recognize this as the reduction rule for the application of the lambda abstraction \x -> y to the term z. So is the Haskell expression let x = x in x nothing more than some syntax on top of the pure lambda calculus? Let's take a look.
First, we need to account for the recursiveness of Haskell's let expressions. The lambda calculus does not admit recursive definitions, but given a primitive fixed-point operator fix,1 we can encode recursiveness explicitly. For example, the Haskell expression let x = x in x has the same meaning as (fix \r x -> r x) z.2 (I've renamed the x on the right side of the application to z to emphasize that it has no implicit relation to the x inside the lambda).
Applying the usual definition of a fixed-point operator, fix f = f (fix f), our translation of let x = x in x reduces (or rather doesn't) like this:
(fix \r x -> r x) z ==>
(\s y -> s y) (fix \r x -> r x) z ==>
(\y -> (fix \r x -> r x) y) z ==>
(fix \r x -> r x) z ==> ...
So at this point in the development of the language, we've introduced the semantics of ⊥ from the foundation of the (typed) lambda calculus with a built-in fixed-point operator. Lovely!
We need a primitive fixed-point operation (that is, one that is built into the language) because it's impossible to define a fixed-point combinator in the simply typed lambda calculus and its close cousins. (The definition of fix in Haskell's Prelude doesn't contradict this—it's defined recursively, but we need a fixed-point operator to implement recursion.)
If you haven't seen this before, you should read up on fixed-point recursion in the lambda calculus. A text on the lambda calculus is best (there are some free ones online), but some Googling should get you going. The basic idea is that we can convert a recursive definition into a non-recursive one by abstracting over the recursive call, then use a fixed-point combinator to pass our function (lambda abstraction) to itself. The base-case of a well-defined recursive definition corresponds to a fixed point of our function, so the function executes, calling itself over and over again until it hits a fixed point, at which point the function returns its result. Pretty damn neat, huh?

Removing common haskell piping boilerplate

I have some pretty common Haskell boilerplate that shows up in a lot of places. It looks something like this (when instantiating classes):
a <= b = (modify a) <= (modify b)
like this (with normal functions):
fn x y z = fn (foo x) (foo y) (foo z)
and sometimes even with tuples, as in:
mod (x, y) = (alt x, alt y)
It seems like there should be an easy way to reduce all of this boilerplate and not have to repeat myself quite so much. (These are simple examples, but it does get annoying). I imagine that there are abstractions created for removing such boilerplate, but I'm not sure what they're called nor where to look. Can any haskellites point me in the right direction?
For the (<=) case, consider defining compare instead; you can then use Data.Ord.comparing, like so:
instance Ord Foo where
compare = comparing modify
Note that comparing can simply be defined as comparing f = compare `on` f, using Data.Function.on.
For your fn example, it's not clear. There's no way to simplify this type of definition in general. However, I don't think the boilerplate is too bad in this instance.
For mod:
mod = alt *** alt
using Control.Arrow.(***) — read a b c as b -> c in the type signature; arrows are just a general abstraction (like functors or monads) of which functions are an instance. You might like to define both = join (***) (which is itself shorthand for both f = f *** f); I know at least one other person who uses this alias, and I think it should be in Control.Arrow proper.
So, in general, the answer is: combinators, combinators, combinators! This ties directly in with point-free style. It can be overdone, but when the combinators exist for your situation, such code can not only be cleaner and shorter, it can be easier to read: you only have to learn an abstraction once, and can then apply it everywhere when reading code.
I suggest using Hoogle to find these combinators; when you think you see a general pattern underlying a definition, try abstracting out what you think the common parts are, taking the type of the result, and searching for it on Hoogle. You might find a combinator that does just what you want.
So, for instance, for your mod case, you could abstract out alt, yielding \f (a,b) -> (f a, f b), then search for its type, (a -> b) -> (a, a) -> (b, b) — there's an exact match, but it's in the fgl graph library, which you don't want to depend on. Still, you can see how the ability to search by type can be very valuable indeed!
There's also a command-line version of Hoogle with GHCi integration; see its HaskellWiki page for more information.
(There's also Hayoo, which searches the entirety of Hackage, but is slightly less clever with types; which one you use is up to personal preference.)
For some of the boilerplate, Data.Function.on can be helpful, although in these examples, it doesn't gain much
instance Ord Foo where
(<=) = (<=) `on` modify -- maybe
mod = uncurry ((,) `on` alt) -- Not really

Haskell composition (.) vs F#'s pipe forward operator (|>)

In F#, use of the the pipe-forward operator, |>, is pretty common. However, in Haskell I've only ever seen function composition, (.), being used. I understand that they are related, but is there a language reason that pipe-forward isn't used in Haskell or is it something else?
In F# (|>) is important because of the left-to-right typechecking. For example:
List.map (fun x -> x.Value) xs
generally won't typecheck, because even if the type of xs is known, the type of the argument x to the lambda isn't known at the time the typechecker sees it, so it doesn't know how to resolve x.Value.
In contrast
xs |> List.map (fun x -> x.Value)
will work fine, because the type of xs will lead to the type of x being known.
The left-to-right typechecking is required because of the name resolution involved in constructs like x.Value. Simon Peyton Jones has written a proposal for adding a similar kind of name resolution to Haskell, but he suggests using local constraints to track whether a type supports a particular operation or not, instead. So in the first sample the requirement that x needs a Value property would be carried forward until xs was seen and this requirement could be resolved. This does complicate the type system, though.
I am being a little speculative...
Culture: I think |> is an important operator in the F# "culture", and perhaps similarly with . for Haskell. F# has a function composition operator << but I think the F# community tends to use points-free style less than the Haskell community.
Language differences: I don't know enough about both languages to compare, but perhaps the rules for generalizing let-bindings are sufficiently different as to affect this. For example, I know in F# sometimes writing
let f = exp
will not compile, and you need explicit eta-conversion:
let f x = (exp) x // or x |> exp
to make it compile. This also steers people away from points-free/compositional style, and towards the pipelining style. Also, F# type inference sometimes demands pipelining, so that a known type appears on the left (see here).
(Personally, I find points-free style unreadable, but I suppose every new/different thing seems unreadable until you become accustomed to it.)
I think both are potentially viable in either language, and history/culture/accident may define why each community settled at a different "attractor".
More speculation, this time from the predominantly Haskell side...
($) is the flip of (|>), and its use is quite common when you can't write point-free code. So the main reason that (|>) not used in Haskell is that its place is already taken by ($).
Also, speaking from a bit of F# experience, I think (|>) is so popular in F# code because it resembles the Subject.Verb(Object) structure of OO. Since F# is aiming for a smooth functional/OO integration, Subject |> Verb Object is a pretty smooth transition for new functional programmers.
Personally, I like thinking left-to-right too, so I use (|>) in Haskell, but I don't think many other people do.
I think we're confusing things. Haskell's (.) is equivalent to F#'s (>>). Not to be confused with F#'s (|>) which is just inverted function application and is like Haskell's ($) - reversed:
let (>>) f g x = g (f x)
let (|>) x f = f x
I believe Haskell programmers do use $ often. Perhaps not as often as F# programmers tend to use |>. On the other hand, some F# guys use >> to a ridiculous degree: http://blogs.msdn.com/b/ashleyf/archive/2011/04/21/programming-is-pointless.aspx
If you want to use F#'s |> in Haskell then in Data.Function is the & operator (since base 4.8.0.0).
I have seen >>> being used for flip (.), and I often use that myself, especially for long chains that are best understood left-to-right.
>>> is actually from Control.Arrow, and works on more than just functions.
Left-to-right composition in Haskell
Some people use left-to-right (message-passing) style in Haskell too. See, for example, mps library on Hackage. An example:
euler_1 = ( [3,6..999] ++ [5,10..999] ).unique.sum
I think this style looks nice in some situations, but it's harder to read (one needs to know the library and all its operators, the redefined (.) is disturbing too).
There are also left-to-right as well as right-to-left composition operators in Control.Category, part of the base package. Compare >>> and <<< respectively:
ghci> :m + Control.Category
ghci> let f = (+2) ; g = (*3) in map ($1) [f >>> g, f <<< g]
[9,5]
There is a good reason to prefer left-to-right composition sometimes: evaluation order follows reading order.
I think
F#'s pipe forward operator (|>) should vs (&) in haskell.
// pipe operator example in haskell
factorial :: (Eq a, Num a) => a -> a
factorial x =
case x of
1 -> 1
_ -> x * factorial (x-1)
// terminal
ghic >> 5 & factorial & show
If you dont like (&) operator, you can custom it like F# or Elixir :
(|>) :: a -> (a -> b) -> b
(|>) x f = f x
infixl 1 |>
ghci>> 5 |> factorial |> show
Why infixl 1 |>? See the doc in Data-Function (&)
infixl = infix + left associativity
infixr = infix + right associativity
(.)
(.) means function composition. It means (f.g)(x) = f(g(x)) in Math.
foo = negate . (*3)
// ouput -3
ghci>> foo 1
// ouput -15
ghci>> foo 5
it equals
// (1)
foo x = negate (x * 3)
or
// (2)
foo x = negate $ x * 3
($) operator is also defind in Data-Function ($).
(.) is used for create Hight Order Function or closure in js. See example:
// (1) use lamda expression to create a Hight Order Function
ghci> map (\x -> negate (abs x)) [5,-3,-6,7,-3,2,-19,24]
[-5,-3,-6,-7,-3,-2,-19,-24]
// (2) use . operator to create a Hight Order Function
ghci> map (negate . abs) [5,-3,-6,7,-3,2,-19,24]
[-5,-3,-6,-7,-3,-2,-19,-24]
Wow, Less (code) is better.
Compare |> and .
ghci> 5 |> factorial |> show
// equals
ghci> (show . factorial) 5
// equals
ghci> show . factorial $ 5
It is the different between left —> right and right —> left. ⊙﹏⊙|||
Humanization
|> and & is better than .
because
ghci> sum (replicate 5 (max 6.7 8.9))
// equals
ghci> 8.9 & max 6.7 & replicate 5 & sum
// equals
ghci> 8.9 |> max 6.7 |> replicate 5 |> sum
// equals
ghci> (sum . replicate 5 . max 6.7) 8.9
// equals
ghci> sum . replicate 5 . max 6.7 $ 8.9
How to functional programming in object-oriented language?
please visit http://reactivex.io/
It support :
Java: RxJava
JavaScript: RxJS
C#: Rx.NET
C#(Unity): UniRx
Scala: RxScala
Clojure: RxClojure
C++: RxCpp
Lua: RxLua
Ruby: Rx.rb
Python: RxPY
Go: RxGo
Groovy: RxGroovy
JRuby: RxJRuby
Kotlin: RxKotlin
Swift: RxSwift
PHP: RxPHP
Elixir: reaxive
Dart: RxDart
Aside from style and culture, this boils down to optimizing the language design for either pure or impure code.
The |> operator is common in F# largely because it helps to hide two limitations that appear with predominantly-impure code:
Left-to-right type inference without structural subtypes.
The value restriction.
Note that the former limitation does not exist in OCaml because subtyping is structural instead of nominal, so the structural type is easily refined via unification as type inference progresses.
Haskell takes a different trade-off, choosing to focus on predominantly-pure code where these limitations can be lifted.
This is my first day to try Haskell (after Rust and F#), and I was able to define F#'s |> operator:
(|>) :: a -> (a -> b) -> b
(|>) x f = f x
infixl 0 |>
and it seems to work:
factorial x =
case x of
1 -> 1
_ -> x * factorial (x-1)
main =
5 |> factorial |> print
I bet a Haskell expert can give you an even better solution.

Resources