Eta-conversion changes semantics in a strict language - haskell

Take this OCaml code:
let silly (g : (int -> int) -> int) (f : int -> int -> int) =
g (f (print_endline "evaluated"; 0))
silly (fun _ -> 0) (fun x -> fun y -> x + y)
It prints evaluated and returns 0. But if I eta-expand f to get g (fun x -> f (print_endline "evaluated"; 0) x), evaluated is no longer printed.
Same holds for this SML code:
fun silly (g : (int -> int) -> int, f : int -> int -> int) : int =
g (f (print "evaluated" ; 0));
silly ((fn _ => 0), fn x => fn y => x + y);
On the other hand, this Haskell code doesn't print evaluated even with the strict pragma:
{-# LANGUAGE Strict #-}
import Debug.Trace
silly :: ((Int -> Int) -> Int) -> (Int -> Int -> Int) -> Int
silly g f = g (f (trace "evaluated" 0))
main = print $ silly (const 0) (+)
(I can make it, though, by using seq, which makes perfect sense for me)
While I understand that OCaml and SML do the right thing theoretically, are there any practical reason to prefer this behaviour to the "lazier" one? Eta-contraction is a common refactoring tool and I'm totally scared of using it in a strict language. I feel like I should paranoically eta-expand everything, just because otherwise arguments to partially applied functions can be evaluated when they're not supposed to. When is the "strict" behaviour useful?
Why and how does Haskell behave differently under the Strict pragma? Are there any references I can familiarize myself with to better understand the design space and pros and cons of the existing approaches?

To address the technical part of your question, eta-conversion also changes the meaning of expressions in lazy languages, you just need to consider the eta-rule of a different type constructor, e.g., + instead of ->.
This is the eta-rule for binary sums:
(case e of Lft y -> f (Lft y) | Rgt y -> f (Rgt y)) = f e (eta-+)
This equation holds under eager evaluation, because e will always be reduced on both sides. Under lazy evaluation, however, the r.h.s. only reduces e if f also forces it. That might make the l.h.s. diverge where the r.h.s. would not. So the equation does not hold in a lazy language.
To make it concrete in Haskell:
f x = 0
lhs = case undefined of Left y -> f (Left y); Right y -> f (Right y)
rhs = f undefined
Here, trying to print lhs will diverge, whereas rhs yields 0.
There is more that could be said about this, but the essence is that the equational theories of both evaluation regimes are sort of dual.
The underlying problem is that under a lazy regime, every type is inhabited by _|_ (non-termination), whereas under eager it is not. That has severe semantic consequences. In particular, there are no inductive types in Haskell, and you cannot prove termination of a structural recursive function, e.g., a list traversal.
There is a line of research in type theory distinguishing data types (strict) from codata types (non-strict) and providing both in a dual manner, thus giving the best of both worlds.
Edit: As for the question why a compiler should not eta-expand functions: that would utterly break every language. In a strict language with effects that's most obvious, because the ability to stage effects via multiple function abstractions is a feature. The simplest example perhaps is this:
let make_counter () =
let x = ref 0 in
fun () -> x := !x + 1; !x
let tick = make_counter ()
let n1 = tick ()
let n2 = tick ()
let n3 = tick ()
But effects are not the only reason. Eta-expansion can also drastically change the performance of a program! In the same way you sometimes want to stage effects you sometimes also want to stage work:
match :: String -> String -> Bool
match regex = \s -> run fsm s
where fsm = ...expensive transformation of regex...
matchFloat = match "[0-9]+(\.[0-9]*)?((e|E)(+|-)?[0-9]+)?"
Note that I used Haskell here, because this example shows that implicit eta-expansion is not desirable in either eager or lazy languages!

With respect to your final question (why does Haskell do this), the reason "Strict Haskell" behaves differently from a truly strict language is that the Strict extension doesn't really change the evaluation model from lazy to strict. It just makes a subset of bindings into "strict" bindings by default, and only in the limited Haskell sense of forcing evaluation to weak head normal form. Also, it only affects bindings made in the module with the extension turned on; it doesn't retroactively affect bindings made elsewhere. (Moreover, as described below, the strictness doesn't take effect in partial function application. The function needs to be fully applied before any arguments are forced.)
In your particular Haskell example, I believe the only effect of the Strict extension is as if you had explicitly written the following bang patterns in the definition of silly:
silly !g !f = g (f (trace "evaluated" 0))
It has no other effect. In particular, it doesn't make const or (+) strict in their arguments, nor does it generally change the semantics of function applications to make them eager.
So, when the term silly (const 0) (+) is forced by print, the only effect is to evaluate its arguments to WHNF as part of the function application of silly. The effect is similar to writing (in non-Strict Haskell):
let { g = const 0; f = (+) } in g `seq` f `seq` silly g f
Obviously, forcing g and f to their WHNFs (which are lambdas) isn't going to have any side effect, and when silly is applied, const 0 is still lazy in its remaining argument, so the resulting term is something like:
(\x -> 0) ((\x y -> <defn of plus>) (trace "evaluated" 0))
(which should be interpreted without the Strict extension -- these are all lazy bindings here), and there's nothing here that will force the side effect.
As noted above, there's another subtle issue that this example glosses over. Even if you had made everything in sight strict:
{-# LANGUAGE Strict #-}
import Debug.Trace
myConst :: a -> b -> a
myConst x y = x
myPlus :: Int -> Int -> Int
myPlus x y = x + y
silly :: ((Int -> Int) -> Int) -> (Int -> Int -> Int) -> Int
silly g f = g (f (trace "evaluated" 0))
main = print $ silly (myConst 0) myPlus
this still wouldn't have printed "evaluated". This is because, in the evaluation of silly when the strict version of myConst forces its second argument, that argument is a partial application of the strict version of myPlus, and myPlus won't force any of its arguments until it's been fully applied.
This also means that if you change the definition of myPlus to:
myPlus x = \y -> x + y -- now it will print "evaluated"
then you'll be able to largely reproduce the ML behavior. Because myPlus is now fully applied, it will force its argument, and this will print "evaluated". You can suppress it again eta-expanding f in the definition of silly:
silly g f = g (\x -> f (trace "evaluated" 0) x) -- now it won't
because now when myConst forces its second argument, that argument is already in WHNF (because it's a lambda), and we never get to the application of f, full or not.
In the end, I guess I wouldn't take "Haskell plus the Strict extension and unsafe side effects like trace" too seriously as a good point in the design space. Its semantics may be (barely) coherent, but they sure are weird. I think the only serious use case is when you have some code whose semantics "obviously" don't depend on lazy versus strict evaluation but where performance would be improved by a lot of forcing. Then, you can just turn on Strict for a performance boost without having to think too hard.


Is `Monad` constraint necessary in `<$!>`

As claimed in the documentation <$!> is the strict version of <$>, but surprisingly
<$!> :: Monad m => (a -> b) -> m a -> m b
f <$!> m = do
x <- m
let z = f x
z `seq` return z
instead of the more natural (in my opinion; because it keeps the weaker constraint and mimics $!)
<$!> :: Functor f => (a -> b) -> f a -> f b
f <$!> x = x `seq` (f <$> x)
I guess that appliying seq after the binding is different than the "natural" approach, but I don't know how different it is. My question is: Is there any reason which makes the "natural" approach useless, and that's why the implementation is constraint to Monad?
GHC's commit message includes the following two links which sheds more light on this function:
This was the reason which is mentioned by Johan Tibell for it (quoting from the linked mailing list):
It works on Monads instead of Functors as required by us inspecting
the argument.
This version is highly convenient if you want to work with
functors/applicatives in e.g. parser and avoid spurious thunks at the
same time. I realized that it was needed while fixing large space
usage (but not space-leak) issues in cassava.
I guess that appliying seq after the binding is different than the "natural" approach, but I don't know how different it is
Since haskell is functional, seq must work through data dependencies; it sets up a relationship: "when seq x y is evaluated to WHNF, a will have been as well".
The idea here is to pin the evaluation of a to the outer m a which we know must be evaluated for each >>= or <*> to proceed.
In your version:
Prelude> f <$!> x = x `seq` (f <$> x)
Prelude> let thunk = error "explode"
Prelude> case (+) <$!> Just thunk <*> Just thunk of ; Just _ -> "we can easily build up thunks"
"we can easily build up thunks"
I do wonder if there's a better solution possible though

How to define a function type of a non-recursive functions that has still a recursive type

Given is a Javascript function like
const isNull = x => x === null ? [] : [isNull];
Such a function might be nonsense, which is not the question, though.
When I tried to express a Haskell-like type annotation, I failed. Likewise with attempting to implement a similar function in Haskell:
let isZero = \n -> if n == 0 then [] else [isZero] -- doesn't compile
Is there a term for this kind of functions that aren't recursive themselves, but recursive in their type? Can such functions be expressed only in dynamically typed languages?
Sorry if this is obvious - my Haskell knowledge (including strict type systems) is rather superficial.
You need to define an explicit recursive type for that.
newtype T = T (Int -> [T])
isZero :: T
isZero = T (\n -> if n == 0 then [] else [isZero])
The price to pay is the wrapping/unwrapping of the T constructor, but it is feasible.
If you want to emulate a Javascript-like untyped world (AKA unityped, or dynamically typed), you can even use
data Value
= VInt Int
| VList [Value]
| VFun (Value -> Value)
(beware of a known bug)
In principle, every Javascript value can be represented by the above huge sum type. For example, application becomes something like
jsApply (VFun f) v = f v
jsApply _ _ = error "Can not apply a non-function value"
Note how static type checks are turned into dynamic checks, in this way. Similarly, static type errors are turned into runtime errors.
Chi showed how such an infinite type can be implemented: you need a newtype wrapper to “hide” the infinite recursion.
An intriguing alternative is to use a fixpoint formulation. Recall that you could pseudo-define something recursive like your example as
isZero = fix $ \f n -> if n == 0 then [] else [f]
Likewise, the type can actually be expressed as a fixpoint of the relevant functors, namely of the composition of (Int ->) and [] (which in transformer gestalt is ListT):
isZero :: Fix (ListT ((->) Int))
isZero = Fix . ListT $ \n -> if n==0 then [] else [isZero]
Also worth noting is that you probably don't really want ListT there. MaybeT would be more natural if you only ever have zero or one elements. Even more nicely though, you can use the fact that functor fixpoints are closely related to the free monad, which gives you exactly that “possibly trivial” alternative case:
isZero' :: Free ((->) Int) ()
isZero' = wrap $ \n -> if n==0 then Pure () else isZero'
Pure () is just return () in the monad instance, so you can as well replace the if construct with the standard when:
isZero' = wrap $ \n -> when (n/=0) isZero'

defining functions with/without lambdas

Which difference in does it make if I define a function with a lambda expression or without so when compiling the module with GHC
f :: A -> B
f = \x -> ...
f :: A -> B
f x = ...
I think I saw that it helps the compiler to inline the function but other than that can it have an impact on my code if I change from the first to the second version.
I am trying to understand someone else's code and get behind the reasoning why this function is defined in the first and not the second way.
To answer that question, I wrote a little program with both ways, and looked at the Core generated:
f1 :: Int -> Int
f1 = \x -> x + 2
{-# NOINLINE f1 #-}
f2 :: Int -> Int
f2 x = x + 2
{-# NOINLINE f2 #-}
I get the core by running ghc test.hs -ddump-simpl. The relevant part is:
f1_rjG :: Int -> Int
[GblId, Arity=1, Str=DmdType]
f1_rjG =
\ (x_alH :: Int) -> + # Int GHC.Num.$fNumInt x_alH (GHC.Types.I# 2)
f2_rlx :: Int -> Int
[GblId, Arity=1, Str=DmdType]
f2_rlx =
\ (x_amG :: Int) -> + # Int GHC.Num.$fNumInt x_amG (GHC.Types.I# 2)
The results are identical, so to answer your question: there is no impact from changing from one form to the other.
That being said, I recommend looking at leftaroundabout's answer, which deals about the cases where there actually is a difference.
First off, the second form is just more flexible (it allows you to do pattern matching, with other clauses below for alternative cases).
When there's only one clause, it's actually equivalent to a lambda... unless you have a where scope. Namely,
f = \x -> someCalculation x y
where y = expensiveConstCalculation
is more efficient than
f x = someCalculation x y
where y = expensiveConstCalculation
because in the latter, y is always recalculated when you evaluate f with a different argument. In the lambda form, y is re-used:
If the signature of f is monomorphic, then f is a constant applicative form, i.e. global constant. That means y is shared throughout your entire program, and only someCalculation needs to be re-done for each call of f. This is typically ideal performance-wise, though of course it also means that y keeps occupying memory.
If f s polymorphic, then it is in fact implicitly a function of the types you're using it with. That means you don't get global sharing, but if you write e.g. map f longList, then still y needs to be computed only once before getting mapped over the list.
That's the gist of the performance differences. Now, of course GHC can rearrange stuff and since it's guaranteed that the results are the same, it might always transform one form to the other if deemed more efficient. But normally it doesn't.

Haskell - Lambda calculus equivalent syntax?

While writing some lambda functions in Haskell, I was originally writing the functions like:
tru = \t f -> t
fls = \t f -> f
However, I soon noticed from the examples online that such functions are frequently written like:
tru = \t -> \f -> t
fls = \t -> \f -> f
Specifically, each of the items passed to the function have their own \ and -> as opposed to above. When checking the types of these they appear to be the same. My question is, are they equivalent or do they actually differ in some way? And not only for these two functions, but does it make a difference for functions in general? Thank you much!
They're the same, Haskell automatically curries things to keep things syntax nice. The following are all equivalent**
foo a b = (a, b)
foo a = \b -> (a, b)
foo = \a b -> (a, b)
foo = \a -> \b -> (a, b)
-- Or we can simply eta convert leaving
foo = (,)
If you want to be idiomatic, prefer either the first or the last. Introducing unnecessary lambdas is good for teaching currying, but in real code just adds syntactic clutter.
However in raw lambda calculus (not Haskell) most manually curry with
\a -> \b -> a b
Because people don't write a lot of lambda calculus by hand and when they do they tend to stick unsugared lambda calculus to keep things simple.
** modulo the monomorphism restriction, which won't impact you anyways with a type signature.
Though, as jozefg said, they are themselves equivalent, they may lead to different execution behaviour when combined with local variable bindings. Consider
f, f' :: Int -> Int -> Int
with the two definitions
f a x = μ*x
where μ = sum [1..a]
f' a = \x -> μ*x
where μ = sum [1..a]
These sure look equivalent, and certainly will always yield the same results.
GHCi, version 7.6.2: :? for help
[1 of 1] Compiling Main            ( def0.hs, interpreted )
Ok, modules loaded: Main.
*Main> sum $ map (f 10000) [1..10000]
*Main> sum $ map (f' 10000) [1..10000]
However, if you try this yourself, you'll notice that with f takes quite a lot of time whereas with f' you get the result immediately. The reason is that f' is written in a form that prompts GHC to compile it so that actually f' 10000 is evaluated before starting to map it over the list. In that step, the value μ is calculated and stored in the closure of (f' 10000). On the other hand, f is treated simply as "one function of two variables"; (f 10000) is merely stored as a closure containing the parameter 10000 and μ is not calculated at all at first. Only when map applies (f 10000) to each element in the list, the whole sum [1..a] is calculated, which takes some time for each element in [1..10000]. With f', this was not necessary because μ was pre-calculated.
In principle, common-subexpression elimination is an optimisation that GHC is able to do itself, so you might at times get good performance even with a definition like f. But you can't really count on it.

Evaluation strategy

How should one reason about function evaluation in examples like the following in Haskell:
let f x = ...
x = ...
in map (g (f x)) xs
In GHC, sometimes (f x) is evaluated only once, and sometimes once for each element in xs, depending on what exactly f and g are. This can be important when f x is an expensive computation. It has just tripped a Haskell beginner I was helping and I didn't know what to tell him other than that it is up to the compiler. Is there a better story?
In the following example (f x) will be evaluated 4 times:
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
With language extensions, we can create situations where f x must be evaluated repeatedly:
{-# LANGUAGE GADTs, Rank2Types #-}
module MultiEvG where
data BI where
B :: (Bounded b, Integral b) => b -> BI
foo :: [BI] -> [Integer]
foo xs = let f :: (Integral c, Bounded c) => c -> c
f x = maxBound - x
g :: (forall a. (Integral a, Bounded a) => a) -> BI -> Integer
g m (B y) = toInteger (m + y)
x :: (Integral i) => i
x = 3
in map (g (f x)) xs
The crux is to have f x polymorphic even as the argument of g, and we must create a situation where the type(s) at which it is needed can't be predicted (my first stab used an Either a b instead of BI, but when optimising, that of course led to only two evaluations of f x at most).
A polymorphic expression must be evaluated at least once for each type it is used at. That's one reason for the monomorphism restriction. However, when the range of types it can be needed at is restricted, it is possible to memoise the values at each type, and in some circumstances GHC does that (needs optimising, and I expect the number of types involved mustn't be too large). Here we confront it with what is basically an inhomogeneous list, so in each invocation of g (f x), it can be needed at an arbitrary type satisfying the constraints, so the computation cannot be lifted outside the map (technically, the compiler could still build a cache of the values at each used type, so it would be evaluated only once per type, but GHC doesn't, in all likelihood it wouldn't be worth the trouble).
Monomorphic expressions need only be evaluated once, they can be shared. Whether they are is up to the implementation; by purity, it doesn't change the semantics of the programme. If the expression is bound to a name, in practice you can rely on it being shared, since it's easy and obviously what the programmer wants. If it isn't bound to a name, it's a question of optimisation. With the bytecode generator or without optimisations, the expression will often be evaluated repeatedly, but with optimisations repeated evaluation would indicate a compiler bug.
Polymorphic expressions must be evaluated at least once for every type they're used at, but with optimisations, when GHC can see that it may be used multiple times at the same type, it will (usually) still be shared for that type during a larger computation.
Bottom line: Always compile with optimisations, help the compiler by binding expressions you want shared to a name, and give monomorphic type signatures where possible.
Your examples are indeed quite different.
In the first example, the argument to map is g (f x) and is passed once to map most likely as partially applied function.
Should g (f x), when applied to an argument within map evaluate its first argument, then this will be done only once and then the thunk (f x) will be updated with the result.
Hence, in your first example, f xwill be evaluated at most 1 time.
Your second example requires a deeper analysis before the compiler can arrive at the conclusion that (f x) is always constant in the lambda expression. Perhaps it will never optimize it at all, because it may have knowledge that trace is not quite kosher. So, this may evaluate 4 times when tracing, and 4 times or 1 time when not tracing.
This is really dependent on GHC's optimizations, as you've been able to tell.
The best thing to do is to study the GHC core that you get after optimizing the program. I would look at the generated Core and examine whether f x had its own let statement outside the map or not.
If you want to be sure, then you should factor f x out into its own variable assigned in a let, but there's not really a guaranteed way to figure it out other than reading through Core.
All that said, with the exception of things like trace that use unsafePerformIO, this will never change the semantics of your program: how it actually behaves.
In GHC without optimizations, the body of a function is evaluated every time the function is called. (A "call" means the function is applied to arguments and the result is evaluated.) In the following example, f x is inside a function, so it will execute each time the function is called.
(GHC may optimize this expression as discussed in the FAQ [1].)
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
However, if we move f x out of the function, it will execute only once.
let f x = trace "!" $ zip x x
x = "abc"
in map ((\f_x i -> lookup i f_x) (f x)) "abcd"
This can be rewritten more readably as
let f x = trace "!" $ zip x x
x = "abc"
g f_x i = lookup i f_x
in map (g (f x)) "abcd"
The general rule is that, each time a function is applied to an argument, a new "copy" of the function body is created. Function application is the only thing that may cause an expression to re-execute. However, be warned that some functions and function calls do not look like functions syntactically.
