Order of evaluation of expression in Haskell and laziness - haskell

I'm trying to understand laziness properly in Haskell.
I understand it such that if we have some expression where we do not actually use a sub part of the expression then that sub part will never be evaluated e.g
let x = [1..1000] in 0 will never actually evaluate the list but just return 0.
However what if i have something like the following where fib(n) is a fibonacci function and will return an error for n<0
let x = div 100 0 + (20 * 100) division by zero error
let x = fib(-3) + fib(7) n < 0 error
Will (20 * 100) and fib(7) ever get evaluated, or will it wait for the first expression to be computed and then stop after i return an error?

As per several comments, the language doesn't make many guarantees about the order of evaluation of subexpressions in a program like:
main = print $ div 100 0 + 20 * 100
So, div 100 0 could be evaluated first and throw an error before 20 * 100 is evaluated, or vice versa. Or, the whole expression could be optimized into unconditionally throwing a division by zero error without evaluating anything, which is what actually happens if you compile it with ghc -O2.
In actual fact, at least with GHC 8.6.5, the function:
foo :: Int -> Int -> Int -> Int
foo x y z = div x y + z * x
compiled with ghc -O2 produces code that attempts the division first and will throw an error if y == 0 before attempting the multiplication, so the subexpressions are evaluated in the order they appear.
HOWEVER, the function with the opposite order:
bar :: Int -> Int -> Int -> Int
bar x y z = z * x + div x y
compiled with ghc -O2 ALSO produces code that tries the division first and will throw an error if y == 0 before attempting the multiplication, so the subexpressions are evaluated in reverse order.
Moreover, even though both versions try the division before the multiplication, there's still a difference in their evaluation order -- bar fully evaluates z before trying the division, while foo evaluates the division before fully evaluating z, so if a lazy, error-generating value is passed for z, these two functions will produce different behavior. In particular,
main = print $ foo 1 0 (error "not so fast")
throws a division by zero error while:
main = print $ bar 1 0 (error "not so fast")
says "not so fast". Neither attempts the multiplication, though.
There aren't any simple rules here. The only way to see these differences is to compile with flags that dump intermediate compiler output, like:
ghc -ddump-stg -dsuppress-all -dsuppress-uniques -fforce-recomp -O2 Test.hs
and inspect the generated code.
If you want to guarantee a particular evaluation order, you need to write something like:
import Control.Parallel (pseq)
foo' :: Int -> Int -> Int -> Int
foo' x y z = let a = div x y
b = z * x
in a `pseq` b `pseq` a + b
bar' :: Int -> Int -> Int -> Int
bar' x y z = let a = z * x
b = div x y
in a `pseq` b `pseq` a + b
The function pseq is similar to the seq function discussed in the comments. The seq function would work here but doesn't always guarantee an evaluation order. The pseq function is supposed to provide a guaranteed order.
If your actual goal is to understand Haskell's lazy evaluation, rather than prevent specific subexpressions from being evaluated in case of errors in other subexpressions, then I'm not sure that looking at these examples will help much. Instead, taking a look at this answer to a related question already linked in the comments may give you better sense of how laziness "works" conceptually.

In this case expressions (20 * 100) and fib(7) will evaluation, but it is because the operator (+) firstly evaluate its second argument. If you write, for example, (20 * 100) + div 100 0, the part (20 * 100) won't evaluate. You can on your own detect which argument evaluate firstly: (error "first") + (error "second"), for example.

Related

Haskell | Are let expressions recalculated?

Lets say we have this function:
foo n = let comp n = n * n * n + 10
otherComp n = (comp n) + (comp n)
in (otherComp n) + (otherComp n)
How many times will comp n get actually executed? 1 or 4? Does Haskell "store" function results in the scope of let?
In GHCi, without optimization, four times.
> import Debug.Trace
> :{
| f x = let comp n = trace "A" n
| otherComp n = comp n + comp n
| in otherComp x + otherComp x
| :}
> f 10
A
A
A
A
40
With optimization, GHC might be able to inline the functions and optimize everything. However, in the general case, I would not count on GHC to optimize multiple calls into one. That would require memoizing and/or CSE (common subexpression elimination), which is not always an optimization, hence GHC is quite conservative about it.
As a thumb rule, when evaluating performance, expect that each (evaluated) call in the code corresponds to an actual call at runtime.
The above discussion applies to function bindings, only. For simple pattern bindings made of just a variable like
let x = g 20
in x + x
then g 20 will be computed once, bound to x, and then x + x will reuse the same value twice. With one proviso: that x gets assigned a monomorphic type.
If x gets assigned a polymorphic type with a typeclass constraint, then it acts as a function in disguise.
> let x = trace "A" (200 * 350)
> :t x
x :: Num a => a
> x + x
A
A
140000
Above, 200 * 350 has been recomputed twice, since it got a polymorphic type.
This mostly only happens in GHCi. In regular Haskell source files, GHC uses the Dreaded Monomorphism Restriction to provide x a monomorphic type, precisely to avoid recomputation of variables. If that can not be done, and duplicate computation is needed, GHC prefers to raise an error than silently cause recomputation. (In GHCi, the DMR is disabled to make more code work as it is, and recomputation happens, as seen above.)
Summing up: variable bindings let x = ... should be fine in source code, and work as expected without duplicating computation. If you want to be completely sure, annotate x with an explicit monomorphic type annotation.

Unresolved top level overloading

Task is to find all two-valued numbers representable as the sum of the sqrt's of two natural numbers.
I try this:
func = [sqrt (x) + sqrt (y) | x <- [10..99], y <- [10..99], sqrt (x) `mod` 1 == 0, sqrt (y) `mod` 1 == 0]
Result:
Unresolved top-level overloading Binding : func
Outstanding context : (Integral b, Floating b)
How can I fix this?
This happens because of a conflict between these two types:
sqrt :: Floating a => a -> a
mod :: Integral a => a -> a -> a
Because you write mod (sqrt x) 1, and sqrt is constrained to return the same type as it takes, the compiler is left trying to find a type for x that simultaneously satisfies the Floating constraint of sqrt and the Integral constraint of mod. There are no types in the base library that satisfy both constraints.
A quick fix is to use mod' :: Real a => a -> a -> a:
import Data.Fixed
func = [sqrt (x) + sqrt (y) | x <- [10..99], y <- [10..99], sqrt (x) `mod'` 1 == 0, sqrt (y) `mod'` 1 == 0]
However, from the error you posted, it looks like you may not be using GHC, and mod' is probably a GHC-ism. In that case you could copy the definition (and the definition of the helper function div') from here.
But I recommend a more involved fix. The key observation is that if x = sqrt y, then x*x = y, so we can avoid calling sqrt at all. Instead of iterating over numbers and checking if they have a clean sqrt, we can iterate over square roots; their squares will definitely have clean square roots. A straightforward application of this refactoring might look like this:
sqrts = takeWhile (\n -> n*n <= 99)
. dropWhile (\n -> n*n < 10)
$ [0..]
func = [x + y | x <- sqrts, y <- sqrts]
Of course, func is a terrible name (it's not even a function!), and sqrts is a constant we could compute ourselves, and is so short we should probably just inline it. So we might then simplify to:
numberSums = [x + y | x <- [4..9], y <- [4..9]]
At this point, I would be wondering whether I really wanted to write this at all, preferring just
numberSums = [8..18]
which, unlike the previous iteration, doesn't have any duplicates. It has lost all of the explanatory power of why this is an interesting constant, though, so you would definitely want a comment.
-- sums of pairs of numbers, each of whose squares lies in the range [10..99]
numberSums = [8..18]
This would be my final version.
Also, although the above definitions were not parameterized by the range to search for perfect squares in, all the proposed refactorings can be applied when that is a parameter; I leave this as a good exercise for the reader to check that they have understood each change.

Expression Evaluation In Haskell: Fixing the type of a sub-expression causes parent expression to be evaluated to different degrees

I am not able to explain the following behavior:
Prelude> let x = 1 + 2
Prelude> let y = (x,x)
Prelude> :sprint y
Prelude> y = _
Now when I specify a type for x:
Prelude> let x = 1 + 2 ::Int
Prelude> let y = (x,x)
Prelude> :sprint y
Prelude> y = (_,_)
Why does the specification of x's type force y to its weak head normal form (WHNF)?
I accidentally discovered this behavior while reading Simon Marlow's Parallel and Concurrent Programming In Haskell.
Here's an informed guess. In your first example,
x :: Num a => a
So
y :: Num a => (a, a)
In GHC core, this y is a function that takes a Num dictionary and gives a pair. If you were to evaluate y, then GHCi would default it for you and apply the Integer dictionary. But from what you've shown, it seems likely that doesn't happen with sprint. Thus you don't yet have a pair; you have a function that produces one.
When you specialize to Int, the dictionary is applied to x, so you get
x :: Int
y :: (Int, Int)
Instead of a function from a dictionary, x is now a thunk. Now no dictionary needs to be applied to evaluate y! y is just the application of the pair constructor to two pointers to the x thunk. Applying a constructor doesn't count as computation, so it's never delayed lazily.

Time cost of Haskell `seq` operator

This FAQ says that
The seq operator is
seq :: a -> b -> b
x seq y will evaluate x, enough to check that it is not bottom, then
discard the result and evaluate y. This might not seem useful, but it
means that x is guaranteed to be evaluated before y is considered.
That's awfully nice of Haskell, but does it mean that in
x `seq` f x
the cost of evaluating x will be paid twice ("discard the result")?
The seq function will discard the value of x, but since the value has been evaluated, all references to x are "updated" to no longer point to the unevaluated version of x, but to instead point to the evaluated version. So, even though seq evaluates and discards x, the value has been evaluated for other users of x as well, leading to no repeated evaluations.
No, it's not compute and forget, it's compute - which forces caching.
For example, consider this code:
let x = 1 + 1
in x + 1
Since Haskell is lazy, this evaluates to ((1 + 1) + 1). A thunk, containing the sum of a thunk and one, the inner thunk being one plus one.
Let's use javascript, a non-lazy language, to show what this looks like:
function(){
var x = function(){ return 1 + 1 };
return x() + 1;
}
Chaining together thunks like this can cause stack overflows, if done repeatedly, so seq to the rescue.
let x = 1 + 1
in x `seq` (x + 1)
I'm lying when I tell you this evaluates to (2 + 1), but that's almost true - it's just that the calculation of the 2 is forced to happen before the rest happens (but the 2 is still calculated lazily).
Going back to javascript:
function(){
var x = function(){ return 1 + 1 };
return (function(x){
return x + 1;
})( x() );
}
I believe x will only be evaluated once (and the result retained for future use, as is typical for lazy operations). That behavior is what makes seq useful.
You can always check with unsafePerformIO or traceā€¦
import System.IO.Unsafe (unsafePerformIO)
main = print (x `seq` f (x + x))
where
f = (+4)
x = unsafePerformIO $ print "Batman!" >> return 3
Of course seq by itself does not "evaluate" anything. It just records the forcing order dependency. The forcing itself is triggered by pattern-matching. When seq x (f x) is forced, x will be forced first (memoizing the resulting value), and then f x will be forced. Haskell's lazy evaluation means it memoizes the results of forcing of expressions, so no repeat "evaluation" (scary quotes here) will be performed.
I put "evaluation" into scary quotes because it implies full evaluation. In the words of Haskell wikibook, "Haskell values are highly layered; 'evaluating' a Haskell value could mean evaluating down to any one of these layers."
Let me reiterate: seq by itself does not evaluate anything. seq x x does not evaluate x under any circumstance. seq x (f x) does not evaluate anything when f = id, contrary to what the report seems to have been saying.

Haskell returns how many inputs are larger than their average value

I'm very new to haskell, writing a simple code that returns how many inputs are larger than their average value. I got error:
ERROR file:.\AverageThree.hs:5 - Type error in application
* Expression : x y z
Term : x
Type : Int
* Does not match : a -> b -> c
Code:
averageThree :: Int -> Int -> Int -> Float
averageThree x y z = (fromIntegral x+ fromIntegral y+ fromIntegral z)/3
howManyAverageThree ::Int -> Int -> Int -> Int
howManyAverageThree x y z = length > averageThree
Anyone help me?
The trouble you're having comes from a few places.
First, you aren't applying either function, length or averageThree - and hence also not using your arguments to howManyAverageThree.
Second, the type of length is [a] -> Int. As you don't have a list here, you either have to use a different function, or make a list.
If I understand your desired algorithm correctly, you are going to need to do a few things:
Apply x y and z to averageThree.
Use the filter function, comparing this computed average with each passed in parameter; this will result in a list.
Find the length of the resulting list.
The code I dashed off to do this follows:
howManyAverageThree ::Int -> Int -> Int -> Int
howManyAverageThree x y z = length $ filter (> avg) the_three
where avg = averageThree x y z
the_three = [fromIntegral x,fromIntegral y,fromIntegral z]
This takes advantage of a couple of neat features:
Currying, sometimes called "partial function application". That's what I was using with (> avg); normally, the infix function > takes two parameters of the same type, and returns a Bool - by wrapping in parenthesis and providing an expression on one side, I have partially applied it, which allows it to be used as a filter function
The where keyword. I used this to clean it all up a little and make it more readable.
The filter function, which I mentioned above.
Function application using $. This operator just changes the function application from left-associative to right-associative.
There are a number of problems here:
length doesn't do what you want it to. length returns the length of a list, and there are no lists in your howManyAvergageThree
averageThree returns a Float. howManyAverageThree needs to account for that. Specifically, > needs its arguments to be of the same type.
The call to averageThree in the second function needs some arguments.
Here's a working version:
howManyAverageThree x y z = length [ i | i <- [x, y, z], fromIntegral i > avg]
where avg = averageThree x y z

Resources