How to get simpler but equivalent version of a Haskell expression

How to get simpler but equivalent version of a Haskell expression - haskell

Although I have been learning Haskell for some time, there is one common problem I run into constantly. Let's take this expression as an example:
e f $ g . h i . j
One may wonder, given $ and . from Prelude, what are type constraints on e or h for expression to be valid?
Is it possible to get a 'simpler' but equivalent representation? For me, 'simpler' would be one that uses parentheses everywhere and eliminates need to define operator precedence rules.
If not, which Haskell report sections do I need to read to have complete picture?
This might be relevant for many novice Haskell programmers. I know many programmers that add parentheses so that they do not need to memorize (or understand) precedence tables like this one: http://docs.oracle.com/javase/tutorial/java/nutsandbolts/operators.html

Is it possible to get a 'simpler' but equivalent representation? Of course, this is called parsing, and is done by compilers, interpreters, etc.
90% of the time, all you need to remember is how $ , . and function application f x work together. This is because $ and function application are really simple - they bind the loosest and tightest respectively - they are like addition and the exponent in bodmas.
From your example
e f $ g . h i . j
the function applications bind first, so we have
(e f) $ g . (h i) . j
Function application is left associative so
f g h ==> ((f g) h)
You may have to google currying to understand why the above can be used like foo(a, b) in other languages.
In the next step do everything in the middle - I just use brackets or a table to remember this bit, it's usually straightforward. For example there are several operators like >> and >>= that are used at the same time when you are working with monads. I just add brackets when ghc complains.
So no we have
(e f) $ (g . ((h i) . j))
The order of the brackets doesn't matter as function composition is associative, however Haskell makes it right associative.
So then we have
((e f) (g . ((h i) . j)))
The above (simple) example demonstrates why those operators exist in the first place.

Related

Breakpoints in the argument-passing scheme of OCaml

Today, I was going through the source code of Jane Street's Core_kernel module and I came across the compose function:
(* The typical use case for these functions is to pass in functional arguments
and get functions as a result. For this reason, we tell the compiler where
to insert breakpoints in the argument-passing scheme. *)
let compose f g = (); fun x -> f (g x)
I would have defined the compose function as:
let compose f g x = f (g x)
The reason they give for defining compose the way they did is “because compose is a function which takes functions f and g as arguments and returns the function fun x -> f (g x) as a result, they defined compose the way they did to tell the compiler to insert a breakpoint after f and g but before x in the argument-passing scheme.”
So I have two questions:
Why do we need breakpoints in the argument-passing scheme?
What difference would it make if we defined compose the normal way?
Coming from Haskell, this convention doesn't make any sense to me.

This is an efficiency hack to avoid the cost of a partial application in the expected use case indicated in the comment.
OCaml compiles curried functions into fixed-arity constructs, using a closure to partially apply them where necessary. This means that calls of that arity are efficient - there's no closure construction, just a function call.
There will be a closure construction within compose for fun x -> f (g x), but this will be more efficient than the partial application. Closures generated by partial application go through a wrapper caml_curryN which exists to ensure that effects occur at the correct time (if that closure is itself partially applied).
The fixed arity that the compiler chooses is based on a simple syntactic analysis - essentially, how many arguments are taken in a row without anything in between. The Jane St. programmers have used this to select the arity that they desire by injecting () "in between" arguments.
In short, let compose f g x = f (g x) is a less desirable definition because it would result in the common two-argument case of compose f g being a more expensive partial application.
Semantically, of course, there is no difference at all.

It's worth noting that compilation of partial application has improved in OCaml, and this performance hack is no longer necessary.

Rewriting as a practical optimization technique in GHC: Is it really needed?

I was reading the paper authored by Simon Peyton Jones, et al. named “Playing by the Rules: Rewriting as a practical optimization technique in GHC”. In the second section, namely “The basic idea” they write:
Consider the familiar map function, that applies a function to each element of a list. Written in Haskell, map looks like this:
map f [] = []
map f (x:xs) = f x : map f xs
Now suppose that the compiler encounters the following call of map:
map f (map g xs)
We know that this expression is equivalent to
map (f . g) xs
(where “.” is function composition), and we know that the latter expression is more efficient than the former because there is no intermediate list. But the compiler has no such knowledge.
One possible rejoinder is that the compiler should be smarter --- but the programmer will always know things that the compiler cannot figure out. Another suggestion is this: allow the programmer to communicate such knowledge directly to the compiler. That is the direction we explore here.
My question is, why can't we make the compiler smarter? The authors say that “but the programmer will always know things that the compiler cannot figure out”. However, that's not a valid answer because the compiler can indeed figure out that map f (map g xs) is equivalent to map (f . g) xs, and here is how:
map f (map g xs)
map g xs unifies with map f [] = [].
Hence map g [] = [].
map f (map g []) = map f [].
map f [] unifies with map f [] = [].
Hence map f (map g []) = [].
map g xs unifies with map f (x:xs) = f x : map f xs.
Hence map g (x:xs) = g x : map g xs.
map f (map g (x:xs)) = map f (g x : map g xs).
map f (g x : map g xs) unifies with map f (x:xs) = f x : map f xs.
Hence map f (map g (x:xs)) = f (g x) : map f (map g xs).
Hence we now have the rules:
map f (map g []) = []
map f (map g (x:xs)) = f (g x) : map f (map g xs)
As you can see f (g x) is just (f . g) and map f (map g xs) is being called recursively. This is exactly the definition of map (f . g) xs. The algorithm for this automatic conversion seems to be pretty simple. So why not implement this instead of rewriting rules?

Aggressive inlining can derive many of the equalities that rewrite rules are short-hand for.
The differences is that inlining is "blind", so you don't know in advance if the result will be better or worse, or even if it will terminate.
Rewrite rules, however, can do completely non-obvious things, based on much higher level facts about the program. Think of rewrite rules as adding new axioms to the optimizer. By adding these you have a richer rule set to apply, making complicated optimizations easier to apply.
Stream fusion, for example, changes the data type representation. This cannot be expressed through inlining, as it involves a representation type change (we reframe the optimization problem in terms of the Stream ADT). Easy to state in rewrite rules, impossible with inlining alone.

Something in that direction was investigated in a Bachelor’s thesis of Johannes Bader, a student of mine: Finding Equations in Functional Programs (PDF file).
To some degree it is certainly possible, but
it is quite tricky. Finding such equations is in a sense as hard as finding proofs in a theorem proofer, and
it is not often very useful, because it tends to find equations that the programmer would rarely write directly.
It is however useful to clean up after other transformations such as inlining and various form of fusion.

This could be viewed as a balance between balancing expectations in the specific case, and balancing them in the general case. This balance can generate funny situations where you can know how to make something faster, but it is better for the language in general if you don't.
In the specific case of maps in the structure you give, the computer could find optimizations. However, what about related structures? What if the function isn't map? What if there's an additional layer of indirection, such as a function that returns map. In those cases, the compiler cannot optimize easily. This is the general case problem.
How if you do optimize the special case, one of two outcomes occurs
Nobody relies on it, because they aren't sure if it is there or not. In this case, articles like the one you quote get written
People do start relying on it, and now every developer is forced to remember "maps done in this configuration get automatically converted to the fast version for me, but if I do it in this configuration I don't.' This starts to manipulate the way people use the language, and can actually reduce readability!
Given the need for developers to think about such optimizations in the general case, we expect to see developers doing these optimizations in the simple case, decreasing the need to for the optimization in the first place!
Now, if it turns out that the particular case you are interested accounts for something massive like 2% of the world codebase in Haskell, there would be a much stronger argument for applying your special-case optimization.

How does Haskell know which function can operate first?

I'm writing a custom language that features some functional elements. When I get stuck somewhere I usually check how Haskell does it. This time though, the problem is a bit to complicated for me to think of an example to give to Haskell.
Here's how it goes.
Say we have the following line
a . b
in Haskell.
Obviously, we are composing two functions, a and b. But what if the function a took another two functions as parameters. What's stopping it from operating on . and b? You can surround it in brackets but that shouldn't make a difference since the expression still evaluates to a function, a prefix one, and prefix functions have precedence over infix functions.
If you do
(+) 2 3 * 5
for example, it will output 25 instead of 17.
Basically what I'm asking is, what mechanism does Haskell use when you want an infix function to operate before a preceding prefix function.
So. If "a" is a function that takes two functions as its parameters. How do you stop Haskell from interpreting
a . b
as "apply . and b to the function a"
and Interpret it as "compose functions a and b".

If you don't put parens around an operator, it's always parsed as infix; i.e. as an operator, not an operand.
E.g. if you have f g ? i j, there are no parens around ?, so the whole thing is a call to (?) (parsed as (f g) ? (i j), equivalent to (?) (f g) (i j)).

I think what you're looking for are fixity declarations (see The Haskell Report).
They basically allow you to declare the operator precedence of infix functions.
For instance, there is
infixl 7 *
infixl 6 +
which means that + and * are both left associative infix operators.
* has precedence 7 while + has precendence 6, i.e * binds stronger than +.
In the report page, you can also see that . is defined as infixr 9 .

Basically what I'm asking is, what mechanism does Haskell use when you
want an infix function to operate before a preceding prefix function.
Just to point out a misconception: This is purely a matter of how expressions are parsed. The Haskell compiler does not know (or: does not need to know) if, in
f . g
f, g and (.) are functions, or whatever.
It goes the other way around:
Parser sees f . g (or, the syntactically equivalent: i + j)
Hands this up as something like App (App (.) f) g following the lexical and syntax rules.
Only then, when the typechecker sees App a b it concludes that a must be a function.

(+) 2 3 * 5
is parsed as
((+) 2 3) * 5
and thus
(2 + 3) * 5
That is, because function applications (like (+) 2 3) get evaluated first, before functions in infix notation, like *.

Haskell composition (.) vs F#'s pipe forward operator (|>)

In F#, use of the the pipe-forward operator, |>, is pretty common. However, in Haskell I've only ever seen function composition, (.), being used. I understand that they are related, but is there a language reason that pipe-forward isn't used in Haskell or is it something else?

In F# (|>) is important because of the left-to-right typechecking. For example:
List.map (fun x -> x.Value) xs
generally won't typecheck, because even if the type of xs is known, the type of the argument x to the lambda isn't known at the time the typechecker sees it, so it doesn't know how to resolve x.Value.
In contrast
xs |> List.map (fun x -> x.Value)
will work fine, because the type of xs will lead to the type of x being known.
The left-to-right typechecking is required because of the name resolution involved in constructs like x.Value. Simon Peyton Jones has written a proposal for adding a similar kind of name resolution to Haskell, but he suggests using local constraints to track whether a type supports a particular operation or not, instead. So in the first sample the requirement that x needs a Value property would be carried forward until xs was seen and this requirement could be resolved. This does complicate the type system, though.

I am being a little speculative...
Culture: I think |> is an important operator in the F# "culture", and perhaps similarly with . for Haskell. F# has a function composition operator << but I think the F# community tends to use points-free style less than the Haskell community.
Language differences: I don't know enough about both languages to compare, but perhaps the rules for generalizing let-bindings are sufficiently different as to affect this. For example, I know in F# sometimes writing
let f = exp
will not compile, and you need explicit eta-conversion:
let f x = (exp) x // or x |> exp
to make it compile. This also steers people away from points-free/compositional style, and towards the pipelining style. Also, F# type inference sometimes demands pipelining, so that a known type appears on the left (see here).
(Personally, I find points-free style unreadable, but I suppose every new/different thing seems unreadable until you become accustomed to it.)
I think both are potentially viable in either language, and history/culture/accident may define why each community settled at a different "attractor".

More speculation, this time from the predominantly Haskell side...
($) is the flip of (|>), and its use is quite common when you can't write point-free code. So the main reason that (|>) not used in Haskell is that its place is already taken by ($).
Also, speaking from a bit of F# experience, I think (|>) is so popular in F# code because it resembles the Subject.Verb(Object) structure of OO. Since F# is aiming for a smooth functional/OO integration, Subject |> Verb Object is a pretty smooth transition for new functional programmers.
Personally, I like thinking left-to-right too, so I use (|>) in Haskell, but I don't think many other people do.

I think we're confusing things. Haskell's (.) is equivalent to F#'s (>>). Not to be confused with F#'s (|>) which is just inverted function application and is like Haskell's ($) - reversed:
let (>>) f g x = g (f x)
let (|>) x f = f x
I believe Haskell programmers do use $ often. Perhaps not as often as F# programmers tend to use |>. On the other hand, some F# guys use >> to a ridiculous degree: http://blogs.msdn.com/b/ashleyf/archive/2011/04/21/programming-is-pointless.aspx

If you want to use F#'s |> in Haskell then in Data.Function is the & operator (since base 4.8.0.0).

I have seen >>> being used for flip (.), and I often use that myself, especially for long chains that are best understood left-to-right.
>>> is actually from Control.Arrow, and works on more than just functions.

Left-to-right composition in Haskell
Some people use left-to-right (message-passing) style in Haskell too. See, for example, mps library on Hackage. An example:
euler_1 = ( [3,6..999] ++ [5,10..999] ).unique.sum
I think this style looks nice in some situations, but it's harder to read (one needs to know the library and all its operators, the redefined (.) is disturbing too).
There are also left-to-right as well as right-to-left composition operators in Control.Category, part of the base package. Compare >>> and <<< respectively:
ghci> :m + Control.Category
ghci> let f = (+2) ; g = (*3) in map ($1) [f >>> g, f <<< g]
[9,5]
There is a good reason to prefer left-to-right composition sometimes: evaluation order follows reading order.

I think
F#'s pipe forward operator (|>) should vs (&) in haskell.
// pipe operator example in haskell
factorial :: (Eq a, Num a) => a -> a
factorial x =
case x of
1 -> 1
_ -> x * factorial (x-1)
// terminal
ghic >> 5 & factorial & show
If you dont like (&) operator, you can custom it like F# or Elixir :
(|>) :: a -> (a -> b) -> b
(|>) x f = f x
infixl 1 |>
ghci>> 5 |> factorial |> show
Why infixl 1 |>? See the doc in Data-Function (&)
infixl = infix + left associativity
infixr = infix + right associativity
(.)
(.) means function composition. It means (f.g)(x) = f(g(x)) in Math.
foo = negate . (*3)
// ouput -3
ghci>> foo 1
// ouput -15
ghci>> foo 5
it equals
// (1)
foo x = negate (x * 3)
or
// (2)
foo x = negate $ x * 3
($) operator is also defind in Data-Function ($).
(.) is used for create Hight Order Function or closure in js. See example:
// (1) use lamda expression to create a Hight Order Function
ghci> map (\x -> negate (abs x)) [5,-3,-6,7,-3,2,-19,24]
[-5,-3,-6,-7,-3,-2,-19,-24]
// (2) use . operator to create a Hight Order Function
ghci> map (negate . abs) [5,-3,-6,7,-3,2,-19,24]
[-5,-3,-6,-7,-3,-2,-19,-24]
Wow, Less (code) is better.
Compare |> and .
ghci> 5 |> factorial |> show
// equals
ghci> (show . factorial) 5
// equals
ghci> show . factorial $ 5
It is the different between left —> right and right —> left. ⊙﹏⊙|||
Humanization
|> and & is better than .
because
ghci> sum (replicate 5 (max 6.7 8.9))
// equals
ghci> 8.9 & max 6.7 & replicate 5 & sum
// equals
ghci> 8.9 |> max 6.7 |> replicate 5 |> sum
// equals
ghci> (sum . replicate 5 . max 6.7) 8.9
// equals
ghci> sum . replicate 5 . max 6.7 $ 8.9
How to functional programming in object-oriented language?
please visit http://reactivex.io/
It support :
Java: RxJava
JavaScript: RxJS
C#: Rx.NET
C#(Unity): UniRx
Scala: RxScala
Clojure: RxClojure
C++: RxCpp
Lua: RxLua
Ruby: Rx.rb
Python: RxPY
Go: RxGo
Groovy: RxGroovy
JRuby: RxJRuby
Kotlin: RxKotlin
Swift: RxSwift
PHP: RxPHP
Elixir: reaxive
Dart: RxDart

Aside from style and culture, this boils down to optimizing the language design for either pure or impure code.
The |> operator is common in F# largely because it helps to hide two limitations that appear with predominantly-impure code:
Left-to-right type inference without structural subtypes.
The value restriction.
Note that the former limitation does not exist in OCaml because subtyping is structural instead of nominal, so the structural type is easily refined via unification as type inference progresses.
Haskell takes a different trade-off, choosing to focus on predominantly-pure code where these limitations can be lifted.

This is my first day to try Haskell (after Rust and F#), and I was able to define F#'s |> operator:
(|>) :: a -> (a -> b) -> b
(|>) x f = f x
infixl 0 |>
and it seems to work:
factorial x =
case x of
1 -> 1
_ -> x * factorial (x-1)
main =
5 |> factorial |> print
I bet a Haskell expert can give you an even better solution.

When should I use $ (and can it always be replaced with parentheses)?

From what I'm reading, $ is described as "applies a function to its arguments." However, it doesn't seem to work quite like (apply ...) in Lisp, because it's a binary operator, so really the only thing it looks like it does is help to avoid parentheses sometimes, like foo $ bar quux instead of foo (bar quux). Am I understanding it right? Is the latter form considered "bad style"?

$ is preferred to parentheses when the distance between the opening and closing parens would otherwise be greater than good readability warrants, or if you have several layers of nested parentheses.
For example
i (h (g (f x)))
can be rewritten
i $ h $ g $ f x
In other words, it represents right-associative function application. This is useful because ordinary function application associates to the left, i.e. the following
i h g f x
...can be rewritten as follows
(((i h) g) f) x
Other handy uses of the ($) function include zipping a list with it:
zipWith ($) fs xs
This applies each function in a list of functions fs to a corresponding argument in the list xs, and collects the results in a list. Contrast with sequence fs x which applies a list of functions fs to a single argument x and collects the results; and fs <*> xs which applies each function in the list fs to every element of the list xs.

You're mostly understanding it right---that is, about 99% of the use of $ is to help avoid parentheses, and yes, it does appear to be preferred to parentheses in most cases.
Note, though:
> :t ($)
($) :: (a -> b) -> a -> b
That is, $ is a function; as such, it can be passed to functions, composed with, and anything else you want to do with it. I think I've seen it used by people screwing with combinators before.

The documentation of ($) answers your question. Unfortunately it isn't listed in the automatically generated documentation of the Prelude.
However it is listed in the sourcecode which you can find here:
http://darcs.haskell.org/packages/base/Prelude.hs
However this module doesn't define ($) directly. The following, which is imported by the former, does:
http://darcs.haskell.org/packages/base/GHC/Base.lhs
I included the relevant code below:
infixr 0 $
...
-- | Application operator. This operator is redundant, since ordinary
-- application #(f x)# means the same as #(f '$' x)#. However, '$' has
-- low, right-associative binding precedence, so it sometimes allows
-- parentheses to be omitted; for example:
--
-- > f $ g $ h x = f (g (h x))
--
-- It is also useful in higher-order situations, such as #'map' ('$' 0) xs#,
-- or #'Data.List.zipWith' ('$') fs xs#.
{-# INLINE ($) #-}
($) :: (a -> b) -> a -> b
f $ x = f x

Lots of good answers above, but one omission:
$ cannot always be replace by parentheses
But any application of $ can be eliminated by using parentheses, and any use of ($) can be replaced by id, since $ is a specialization of the identity function. Uses of (f$) can be replaced by f, but a use like ($x) (take a function as argument and apply it to x) don't have any obvious replacement that I see.

If I look at your question and the answers here, Apocalisp and you are both right:
$ is preferred to parentheses under certain circumstances (see his answer)
foo (bar quux) is certainly not bad style!
Also, please check out difference between . (dot) and $ (dollar sign), another SO question very much related to yours.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to get simpler but equivalent version of a Haskell expression - haskell

Related

Breakpoints in the argument-passing scheme of OCaml

Rewriting as a practical optimization technique in GHC: Is it really needed?

How does Haskell know which function can operate first?

Haskell composition (.) vs F#'s pipe forward operator (|>)

When should I use $ (and can it always be replaced with parentheses)?

Categories

Resources