Function with strict arguments - haskell

A corrected quiz in my textbook is asking me how many of f's arguments are strict, f being:
f x 0 z = x == z
f x y z = x
My initial thought was that all of f's arguments are to be considered strict, since y is being evaluated to check if its equal to 0, and x and z are compared to see that they're both equal.
And yet the answer is that only x and y are strict.
Any clues as to why?

First of all, you need a very precise definition of "strict" in order for this to make sense. A function f is strict iff evaluating f x to whnf causes x to be evaluated to whnf. The interaction this has with currying is a bit awkward, and I'm going to ignore some of the potential weirdness that introduces.
Assuming the type here is f :: Bool -> Int -> Bool -> Bool your analysis of the behavior wrt y is correct - evaluating f x y z to whnf will always require evaluating y to determine which equation to choose. As that is the only factor determining which equation to use, we have to split the analysis for x and z. In the first equation, evaluating the result to whnf results in both x and z being evaluated. In the second equation, evaluating the result to whnf results in evaluating x to whnf.
Since x is evaluated in both branches, this function is strict in x. This is a little bit amusing - it's strict in the way id is strict. But that's still valid! z, however, is a different story. Only one of the branches causes z to be evaluated, so it's not evaluated strictly - it's only evaluated on demand. Usually we talk about this happening where evaluation is guarded behind a constructor or when a function is applied and the result isn't evaluated, but being conditionally evaluated is sufficient. f True 1 undefined evaluates to True. If f was strict in z, that would have to evaluate to undefined.

It turns out that whether f is strict in its second argument depends on what type it gets resolved to.
Here's proof:
data ModOne = Zero
instance Eq ModOne where
_ == _ = True -- after all, they're both Zero, right?
instance Num ModOne -- the method implementations literally don't matter
f x 0 z = x == z
f x y z = x
Now in ghci:
> f True (undefined :: ModOne) True
True
> f True (undefined :: Int) True
*** Exception: Prelude.undefined
And, in a related way, whether f is strict in its third argument depends on what values you pick for the first two. Proof, again:
> f True 1 undefined
True
> f True 0 undefined
*** Exception: Prelude.undefined
So, there isn't really a simple answer to this question! f is definitely strict in its first argument; but the other two are conditionally one or the other depending on circumstances.

Related

Why is it not possible to define an infix operator via an equation on a section?

Hutton's "Programming in Haskell", first edition, says that the concatenation operator ++ could be defined as:
(++ ys) = foldr (:) ys
This makes logical sense.
I had never seen an operator being defined by an equation on one of its sections (in this case (++ ys)), so I tried it myself:
(##) :: [a] -> [a] -> [a]
(## ys) = foldr (:) ys
However this doesn't compile, higlighting a syntax error in (## ys).
Has this never been a feature, or has it been removed at some point? If so, why?
I know I could write the above as:
xs ## ys = foldr (:) ys xs
But I find the point-free style more elegant.
This would result in some subtle inconsistencies. Although we tend to think of curried and flipped and uncurried functions as just different ways of writing the same thing, that is not quite true when it comes to the actual evaluation strategy. Consider
(#>) :: Integer -> Integer -> Integer
(#>) n = let p = {- the `n`-th prime number -} `mod` 74
in (p+)
Indexing prime numbers is costly. If you write something like
map ((2^43) #>) [100 .. 150]
then the 243-th prime number needs to be computed only once. By contrast, if I define
(<#) :: Integer -> Integer -> Integer
(<#) = flip foo
then writing map (<# (2^43)) [100 .. 150] would compute the prime number over and over again, because Haskell doesn't support partially applying functions on the second argument.
With the flip foo definition this isn't too surprising, but if I could have defined the flipped form directly as
(<#n) = let p = {- the `n`-th prime number -} `mod` 74
in (p+)
then one could reasonably expect that map (<# (2^43)) does share the prime computation, but to support that Haskell's partial evaluation semantics would need to track more information than they currently do, and if we want this to work reliably then it would probably incur some other disadvantages.
I think there's a simpler explanation to do with how complex already are the allowed syntactic forms on lhs of an = binding.
Please always post the error message you're getting, don't just say "higlighting a syntax error". The message might not mean a lot to you, but in this case it gives a strong hint:
(## ys) = ...
===> error: Parse error in pattern: ##ys
(xs ##) = ...
===> error: Expression syntax in pattern: xs ##
"in pattern" aha! That is, the lhs is potentially a syntactic pattern. Furthermore there might not be a signature decl for whatever you're introducing; even if there is, the compiler has to check your equation against the signature, so it can't assume anything about the arity of what you're introducing. Consider these valid equations
z = 42 -- z counts as a pattern
Just z = {- long and complex expr returning a Maybe, binds z at module-wide scope -}
(Just z) = {- same same, binds z at module-wide scope -}
foo x = ... -- foo gets module-wide scope but not x
(foo x) = ... -- same
bar x y = ... -- bar gets module-wide scope but not x, y
(bar x) y = ... -- same
(x ## y) z = ... -- accepted, introduces triadic operator ##
x ## y z = -- rejected error: Parse error in pattern: y
(x ##) y = -- rejected error: Expression syntax in pattern: x ##
(## y) z = -- rejected error: Parse error in pattern: ##y
The Language Report (section 4.4.3 Function and Pattern Bindings) has
decl -> (funlhs | pat) rhs
funlhs -> var apat { apat }
| pat varop pat
| ( funlhs ) apat { apat }
So the lhs is not a place where expression syntax (incl operator sections) can appear. See also the ugly detail at the end of section 4.4.3.1 to do with using lhs operator syntax in combo with a infix data constructor ugh!
The last sentence here also confirms you can't use operator sections on lhs.

How is Haskell's seq used?

So, Haskell seq function forces the evaluation of it's first argument and returns the second. Consequently it is an infix operator. If you want to force the evaluation of an expression, intuitively such a feature would be a unary operator. So, instead of
seq :: a -> b -> b
it would be
seq :: a -> a
Consequently, if the value you want is a, why return b and how do you construct for the return of b. Clearly, I am not thinking Haskell. :)
The way to think about a `seq` b is not that it "evaluates a" but that it creates a dependency between a and b, so that when you go to evaluate b you evaluate a as well.
This means, for example, that a `seq` a is completely redundant: you're telling Haskell to evaluate a when you evaluate a. By the same logic, seq a with just one argument would not be any different than simply writing a by itself.
Just having seq a that somehow evaluates a would not work. The problem is that seq a is itself an expression that might not be evaluated—it might be deep inside some nested thunks, for example. So it would only become relevant when you get to evaluating the whole seq a expression—at which point you would have been evaluating a by itself anyhow.
#Rhymoid's example of how it's used in a strict fold (foldl') is good. Our goal is to write a fold such that its intermediate accumulated value (acc) is completely evaluated at each step as soon as we evaluate the final result. This is done by adding a seq between the accumulated value and the recursive call:
foldl' f z (x:xs) =
let z' = f z x in z' `seq` foldl' f z' xs
You can visualize this as a long chain of seq between each application of f in the fold, connecting all of them to the final result. This way when you evaluate the final expression (ie the number you get by by summing a list), it evaluates the intermediate values (ie partial sums as you fold through the list) strictly.

Programming style in OCaml

I have a question about the correct way to write efficient functional programs. Suppose I'm given a list s of positive ints, and I want to find the minimum element (or just 0 if empty). Then a generic functional program for doing this would look like
minList s =
| [] -> undefined
| [x] -> x
| x :: t -> min x (minList t)
In a lazy language one can make this more efficient by adding an extra clause which terminates the recursion if a zero is found - this way s is only computed up to the first zero
minList s =
| [] -> undefined
| [x] -> x
| x :: t -> if x==0 then 0 else min x (minList t)
However, am I correct in believing that this sort of trick would not work in a strict evaluation language like OCaml, which would evaluate the whole of s before running minList? If so, what would be the correct way to optimize this in OCaml?
ADDITIONAL QUESTION: Ok, so if I understand that if statements are always lazy. But what about the following, for example: I have a function on int lists again which first checks whether or not the ith element is zero i.e.
f s = if s(i)==0 then 0 else g s
Here the input sequence s is present in both clauses of the if statement, but clearly for an efficient computation you would only want to evaluate s(i) in the first case. Here, would OCaml always evaluate all of s, even if the first case succeeds?
if expressions in ocaml don't follow the strict evaluation rule.
Like || and &&, it's lazily evaluated.
See this link: if expressions
In a strictly evaluated language, the whole list s would be evaluated. Still,
minList s =
| [] -> 0
| x :: t -> if x==0 then 0 else min x (minList t)
would not scan the whole list if a 0 is found.
The if construct has a "non-strict" semantics, in that it will evaluate only one branch, and not both. This holds in both strict and non strict languages.
An actual difference would be when calling a "user defined if" such as (using Haskell syntax):
myIf :: Bool -> a -> a
myIf b x y = if b then x else y
In a non strict language, calling myIf True 3 (nonTerminatingFunction ()) would yield 3, while in a strict language the same expression would loop forever.
First of all the minimum of an empty list is undefined, not 0. This makes sense, otherwise minList [1,2,3] would be 0 which is clearly not true. This is what ghci has to say:
Prelude> minimum []
*** Exception: Prelude.minimum: empty list
Hence your function should be written as:
let minList (x::t) = min x (minList t)
There are some problems with this definition though:
It will still give an error because there's no pattern match for the empty list.
It is not tail recursive.
It doesn't stop if the head is 0.
So here's a better solution:
let minimum x xs = match x,xs
| 0,xs -> 0
| x,[] -> x
| x,(y :: ys) -> minimum (min x y) ys
let minList = function
| [] -> raise Failure "No minimum of empty list"
| x::t -> minimum x t
The advantage of writing it like this is that minimum is tail recursive. Hence it will not increase the stack size. In addition if the head is 0 it will immediately return 0.
Lazy evaluation has no play here.
In almost every modern programming language:
for expr1 && expr2, if expr1 is already false, then expr2 won't be evaluated.
for expr1 || expr2, if expr1 is already true, then expr2 won't be evaluated.
OCaml does this too.

Lazy Evaluation and Strict Evaluation Haskell

I understand what lazy evaluation is, and how it works and the advantages it has, but could you explain me what strict evaluation really is in Haskell? I can't seem to find much info about it, since lazy evaluation is the most known.
What are the benefit of each of them over the other. When is strict evaluation actually used?
Strictness happens in a few ways in Haskell,
First, a definition. A function is strict if and only if when its argument a doesn't terminate, neither does f a. Nonstrict (sometimes called lazy) is just the opposite of this.
You can be strict in an argument, either using pattern matching
-- strict
foo True = 1
foo False = 1
-- vs
foo _ = 1
Since we don't need to evaluate the argument, we could pass something like foo (let x = x in x) and it'd still just return 1. With the first one however, the function needs to see what value the input is so it can run the appropriate branch, thus it is strict.
If we can't pattern match for whatever reason, then we can use a magic function called seq :: a -> b -> b. seq basically stipulates that whenever it is evaluated, it will evaluated a to what's called weak head normal form.
You may wonder why it's worth it. Let's consider a case study, foldl vs foldl'. foldl is lazy in it's accumulator so it's implemented something like
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f accum [] = acuum
foldl f accum (x:xs) = foldl (f accum x) xs
Notice that since we're never strict in accum, we'll build up a huge series of thunks, f (f (f (f (f (f ... accum)))..)))
Not a happy prospect since this will lead to memory issues, indeed
*> foldl (+) 0 [1..500000000]
*** error: stack overflow
Now what'd be better is if we forced evaluation at each step, using seq
foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' f accum [] = accum
foldl' f accum (x:xs) = let accum' = f accum x
in accum' `seq` foldl' f accum' xs
Now we force the evaluation of accum at each step making it much faster. This will make foldl' run in constant space and not stackoverflow like foldl.
Now seq only evaluates it values to weak head normal form, sometimes we want them to be evaluated fully, to normal form. For that we can use a library/type class
import Control.DeepSeq -- a library on hackage
deepseq :: NFData a => a -> b -> a
This forces a to be fully evaluated so,
*> [1, 2, error "Explode"] `seq` 1
1
*> [1, 2, error "Explode"] `deepseq` 1
error: Explode
*> undefined `seq` 1
error: undefined
*> undefined `deepseq` 1
error undefined
So this fully evaluates its arguments. This is very useful for parallel programming for example, where you want to fully evaluate something on one core before it's sent back to the main thread, otherwise you'd just create a thunk and all the actual computation would still be sequential.

Haskell: foldl' accumulator parameter

I've been asking a few questions about strictness, but I think I've missed the mark before. Hopefully this is more precise.
Lets say we have:
n = 1000000
f z = foldl' (\(x1, x2) y -> (x1 + y, y - x2)) z [1..n]
Without changing f, what should I set
z = ...
So that f z does not overflow the stack? (i.e. runs in constant space regardless of the size of n)
Its okay if the answer requires GHC extensions.
My first thought is to define:
g (a1, a2) = (!a1, !a2)
and then
z = g (0, 0)
But I don't think g is valid Haskell.
So your strict foldl' is only going to evaluate the result of your lambda at each step of the fold to Weak Head Normal Form, i.e. it is only strict in the outermost constructor. Thus the tuple will be evaluated, however those additions inside the tuple may build up as thunks. This in-depth answer actually seems to address your exact situation here.
W/R/T your g: You are thinking of BangPatterns extension, which would look like
g (!a1, !a2) = (a1, a2)
and which evaluates a1 and a2 to WHNF before returning them in the tuple.
What you want to be concerned about is not your initial accumulator, but rather your lambda expression. This would be a nice solution:
f z = foldl' (\(!x1, !x2) y -> (x1 + y, y - x2)) z [1..n]
EDIT: After noticing your other questions I see I didn't read this one very carefully. Your goal is to have "strict data" so to speak. Your other option, then, is to make a new tuple type that has strictness tags on its fields:
data Tuple a b = Tuple !a !b
Then when you pattern match on Tuple a b, a and b will be evaluated.
You'll need to change your function regardless.
There is nothing you can do without changing f. If f were overloaded in the type of the pair you could use strict pairs, but as it stands you're locked in to what f does. There's some small hope that the compiler (strictness analysis and transformations) can avoid the stack growth, but nothing you can count on.

Resources