How to define multiple patterns in Frege? - haskell

I'm having some trouble defining a function in Frege that uses multiple patterns. Basically, I'm defining a mapping by iterating through a list of tuples. I've simplified it down to the following:
foo :: a -> [(a, b)] -> b
foo _ [] = [] --nothing found
foo bar (baz, zab):foobar
| bar == baz = zab
| otherwise = foo bar foobar
I get the following error:
E morse.fr:3: redefinition of `foo` introduced line 2
I've seen other examples like this that do use multiple patterns in a function definition, so I don't know what I'm doing wrong. Why am I getting an error here? I'm new to Frege (and new to Haskell), so there may be something simple I'm missing, but I really don't think this should be a problem.
I'm compiling with version 3.24-7.100.

This is a pure syntactical problem that affects newcomers to languages of the Haskell family. It won't take too long until you internalize the rule that function application has higher precedence than infix expression.
This has consequences:
Complex arguments of function application need parentheses.
In infix expressions, function applications on either side of the operator do not need parentheses (however, individual components of function application may still need them).
In Frege, in addition, the following rule holds:
The syntax of function application and infix expressions on the left hand side of a definition is identical to the one on the right hand side as far as lexemes allowed on both sides are concerned. (This holds in Haskell only when # and ~ are not used.)
This is so you can define an addition function like this:
data Number = Z | Succ Number
a + Z = a
a + Succ b = Succ a + b
Hence, when you apply this to your example, you see that syntactically, you're going to redefine the : operator. To achieve what you want, you need to write it thus:
foo bar ((baz, zab):foobar) = ....
-- ^ ^
This corresponds to the situation where you apply foo to a list you are constructing:
foo 42 (x:xs)
When you write
foo 42 x:xs
this means
(foo 42 x):xs

Related

Pass more than 1 parameter to monad

I'm learning Haskell and making up some examples. I'm not sure why the second example doesn't work
foo :: Int -> Int -> Maybe Int
foo 0 0 = Nothing
foo a b = Just $ a + b
bar :: Int -> Maybe Int
bar 0 = Nothing
bar a = Just $ a + 1
-- This works
Just 4 >>= bar
-- Why this doesn't work?
(Just 4 Just 4) >>= foo
-- This works
do
a <- Just 3
b <- Just 4
foo a b
As the comment says, (Just 4 Just 4) tries to apply the constructor Just to 3 arguments when it only takes one. So, I will assume that you wanted something like (Just 4, Just 4), and want it to work like your final example.
The type of the "bind" operator is (>>=) :: Monad m => m a -> (m a -> b) -> m b. This means that the function expected after the operator only takes one argument, not two. So, again, the ultimate reason why it doesn't work is that, your function takes the wrong number of arguments. (Partial application means that you don't have to provide all the arguments at once, but it sounds like you're expecting some other piece of data to be magically routed to the missing argument...)
Desugaring your do example to >>= form translates as:
Just 3 >>= \a -> Just 4 >>= \b -> foo a b
To make this a little clearer, I'll parenthesize the lambdas:
Just 3 >>= ( \a -> Just 4 >>= (\b -> foo a b) )
That makes it easier to see that you can simplify the inner lambda:
Just 3 >>= ( \a -> Just 4 >>= foo a )
So, it's possible after all to route the missing data to the extra argument! But, you do have to work out the routing yourself...
There's nothing particularly magical about Haskell functions; they tend to be more particular about how they're called than dynamic languages. The largest "magic" here is that the type checker can often tell when you're not using them correctly.
And (as the other answer notes) there is nothing magical about >>= -- it's just another function, and in order to understand how to use it, you need to take a look at its type.
It doesn't work because >>= is a perfectly normal operator (and operators are perfectly normal functions).
You seem to be thinking of >>= as special syntax for getting values out of the monadic value on its left and feeding it to the function on the right. It is not special syntax; rather >>= itself is a function that gets applied to the values on its left and its right (and then computes a result as you expect).
However, that means that the left and right arguments must be valid expressions for things that could exist as ordinary values; things you could simply bind to variables with var = <expr> syntax. Just 4 >>= bar works because (among other requirements) Just 4 on its own is a valid expression of type Maybe Int and bar is a valid expression of type Int -> Maybe Int. Just 4 Just 4 >>= foo doesn't work because Just 4 Just 4 is not a correct expression (what would it's type be?); it's interpreted as applying Just to the 3 separate arguments 4, Just, and 4, whereas you want it to be two separate values Just 4 and Just 4. But even if you could get the compiler to interpret something there as two separate values, there's no way for >>= to be passed two separate values as its left argument; it's expecting (in this usage) a single value of type Just Int.
If you have a function like foo that needs two arguments and you want to source those arguments from values that are in a monadic context, then you can't just apply >>= you need to write code that does that (like your final example with the do block; there are many other ways to do something equivalent).
The other answers described why this doesn't work. But IMO it's quite reasonable that you want this, and indeed Just 3 >>= \x -> Just 4 >>= \y -> foo x y is a bit of a silly solution to the task. Basically, the x and y values are independent of each other, yet you're fetching them sequentially, in a way that the complete y calculation could in principle depend on the value of x.
Monads aren't really the right abstraction here, they're too strong. To get x and y non-sequentially, you can use Applicative interface. The form that most Haskellers prefer nowadays (I think) is
foo <$> Just 3 <*> Just 4
You can read this as “zip the effectful values Just 3 and Just 4 together to a single action with two values, then apply foo over those values”.
...Actually that's not really how it works though, and for me that was super confusing when I first learned about applicatives. Namely, the above expression is in fact parsed as
(foo <$> Just 3) <*> Just 4
which looks again like it's sequential-style. But it's not, what going on here is only a currying/laziness trick to pass multiple values through the applicative value without having to group them to a suitable tuple. The code that literally works like I explained it would be
uncurry foo <$> ((,)<$>Just 3<*>Just 4)
Here, (,)<$>Just 3<*>Just 4 evaluates to Just (3,4). Then fmapping foo over that needs to be done in uncurried form, so the two arguments are accepted as a tuple. It's structurally clear, yet awkward because we're working against Haskell's curried style.
(Mathematically, this tupling is what's conceptually happing though: generally speaking, you're working in a monoidal category. Some other incarnations of applicative functors have such a tuppling-combinator as their underlying interface, instead of <*>; e.g. >*< from the invertible package.)
The trick with foo<$>Just 3<*>Just 4 is that instead of building a tuple, we start with partially applying foo to the 3 result. This doesn't actually require anything applicative/monadic yet – we're basically just transforming the contained value – in general: values – from 3 to foo 3, without touching their context. You may consider this a purely symbolic operation. Note that the type is Maybe (Int -> Int) at this point.
Then you use the <*> combinator to zip both of the Maybe contexts together, and simultaneously apply the foo 3 partially-evaluated function to its second argument.
I personally like this form, which is also equivalent:
liftA2 foo (Just 3) (Just 4)
We're not finished yet though: all the above suggestions give a result of type Maybe (Maybe Int). To flatten that into a Maybe Int, that's where you actually need the monad interface. One option is
join $ foo <$> Just 3 <*> Just 4

Can any recursive definition be rewritten using foldr?

Say I have a general recursive definition in haskell like this:
foo a0 a1 ... = base_case
foo b0 b1 ...
| cond1 = recursive_case_1
| cond2 = recursive_case_2
...
Can it always rewritten using foldr? Can it be proved?
If we interpret your question literally, we can write const value foldr to achieve any value, as #DanielWagner pointed out in a comment.
A more interesting question is whether we can instead forbid general recursion from Haskell, and "recurse" only through the eliminators/catamorphisms associated to each user-defined data type, which are the natural generalization of foldr to inductively defined data types. This is, essentially, (higher-order) primitive recursion.
When this restriction is performed, we can only compose terminating functions (the eliminators) together. This means that we can no longer define non terminating functions.
As a first example, we lose the trivial recursion
f x = f x
-- or even
a = a
since, as said, the language becomes total.
More interestingly, the general fixed point operator is lost.
fix :: (a -> a) -> a
fix f = f (fix f)
A more intriguing question is: what about the total functions we can express in Haskell? We do lose all the non-total functions, but do we lose any of the total ones?
Computability theory states that, since the language becomes total (no more non termination), we lose expressiveness even on the total fragment.
The proof is a standard diagonalization argument. Fix any enumeration of programs in the total fragment so that we can speak of "the i-th program".
Then, let eval i x be the result of running the i-th program on the natural x as input (for simplicity, assume this is well typed, and that the result is a natural). Note that, since the language is total, then a result must exist. Moreover, eval can be implemented in the unrestricted Haskell language, since we can write an interpreter of Haskell in Haskell (left as an exercise :-P), and that would work as fine for the fragment. Then, we simply take
f n = succ $ eval n n
The above is a total function (a composition of total functions) which can be expressed in Haskell, but not in the fragment. Indeed, otherwise there would be a program to compute it, say the i-th program. In such case we would have
eval i x = f x
for all x. But then,
eval i i = f i = succ $ eval i i
which is impossible -- contradiction. QED.
In type theory, it is indeed the case that you can elaborate all definitions by dependent pattern-matching into ones only using eliminators (a more strongly-typed version of folds, the generalisation of lists' foldr).
See e.g. Eliminating Dependent Pattern Matching (pdf)

What's the reason of 'let rec' for impure functional language OCaml?

In the book Real World OCaml, the authors put why OCaml uses let rec for defining recursive functions.
OCaml distinguishes between nonrecursive definitions (using let) and recursive definitions (using let rec) largely for technical reasons: the type-inference algorithm needs to know when a set of function definitions are mutually recursive, and for reasons that don't apply to a pure language like Haskell, these have to be marked explicitly by the programmer.
What are the technical reasons that enforces let rec while pure functional languages not?
When you define a semantics of function definition, as a language designer, you have choices: either to make the name of the function visible in the scope of its own body, or not. Both choices are perfectly legal, for example C-family languages being far from functional, still do have names of definitions visible in their scope (this also extends to all definitions in C, making this int x = x + 1 legal). OCaml language decides to give us extra flexibility of making the choice by ourselves. And that's really great. They decided to make it invisible by default, a fairly decent solution, since most of the functions that we write are non recursive.
What concerning the cite, it doesn't really correspond to the function definitions – the most common use of the rec keyword. It is mostly about "Why the scope of function definition doesn't extend to the body of the module". This is a completely different question.
After some research I've found a very similar question, that has an answer, that might satisfy you, a cite from it:
So, given that the type checker needs to know about which sets of
definitions are mutually recursive, what can it do? One possibility is
to simply do a dependency analysis on all the definitions in a scope,
and reorder them into the smallest possible groups. Haskell actually
does this, but in languages like F# (and OCaml and SML) which have
unrestricted side-effects, this is a bad idea because it might reorder
the side-effects too. So instead it asks the user to explicitly mark
which definitions are mutually recursive, and thus by extension where
generalization should occur.
Even without any reordering, with arbitrary non-pure expressions, that can occur in the function definition (a side effect of definition, not evaluation) it is impossible to build the dependency graph. Consider demarshaling and executing function from file.
To summarize, we have two usages of let rec construct, one is to create a self recursive function, like
let rec seq acc = function
| 0 -> acc
| n -> seq (acc+1) (n-1)
Another is to define mutually recursive functions:
let rec odd n =
if n = 0 then true
else if n = 1 then false else even (n - 1)
and even n =
if n = 0 then false
else if n = 1 then true else odd (n - 1)
At the first case, there is no technical reasons to stick to one or to another solution. This is just a matter of taste.
The second case is harder. When inferring type you need to split all function definitions into clusters consisting of mutually depending definitions, in order to narrow typing environment. In OCaml it is harder to make, since you need to take into account side-effects. (Or you can continue without splitting it into principal components, but this will lead to another issue – your type system will be more restrictive, i.e., will disallow more valid programs).
But, revisiting the original question and the quote from RWO, I'm still pretty sure that there is no technical reasons for adding the rec flag. Consider, SML that has the same problems, but still has rec enabled by default. There is a technical reason, for let ... and ... syntax for defining a set of mutual recursive functions. In SML this syntax doesn't require us to put the rec flag, in OCaml does, thus giving us more flexibility, like the ability to swap to values with let x = y and y = x expression.
What are the technical reasons that enforces let rec while pure functional languages not?
Recursiveness is a strange beast. It has a relation to purity, but it's a little more oblique than this. To be clear, you could write "alterna-Haskell" which retains its purity, its laziness but does not have recursively bound lets by default and demands some kind of rec marker just as OCaml does. Some would even prefer this.
In essence, there are just many different kinds of "let"s possible. If we compare let and let rec in OCaml we'll see a small difference. In static formal semantics, we might write
Γ ⊢ E : A Γ, x : A ⊢ F : B
-----------------------------
Γ ⊢ let x = E in F : B
which says that if we can prove in a variable environment Γ that E has type A and if we can prove in the same variable environment Γ augmented with x : A that F : B then we can prove that in the variable environment Γ let x = E in F has type B.
The thing to watch is the Γ argument. This is just a list of ("variable name", "value") pairs like [(x, 3); (y, "hello")] and augmenting the list like Γ, x : A just means consing (x, A) on to it (sorry that the syntax is flipped).
In particular, let's write the same formalism for let rec
Γ, x : A ⊢ E : A Γ, x : A ⊢ F : B
-------------------------------------
Γ ⊢ let rec x = E in F : B
In particular, the only difference is that neither of our premises work in the plain Γ environment; both are allowed to assume the existence of the x variable.
In this sense, let and let rec are simply different beasts.
So what does it mean to be pure? At the strictest definition, of which Haskell doesn't even participate, we must eliminate all effects including non-termination. The only way to achieve this is to pull away our ability to write unrestricted recursion and replace it only carefully.
There exist plenty of languages without recursion. Perhaps the most important one is the Simply Typed Lambda Calculus. In it's basic form it is regular lambda calculus but augmented with a typing discipline where types are bit like
type ty =
| Base
| Arr of ty * ty
It turns out that STLC cannot represent recursion---the Y combinator, and all other fixed-point cousin combinators, cannot be typed. Thusly, STLC is not Turing Complete.
It is however uncompromisingly pure. It achieves that purity with the bluntest of instruments, however, by completely outlawing recursion. What we'd really like is some kind of balanced, careful recursion which doesn't lead to non-termination---we'll still be Turing Incomplete, but not so crippled.
Some languages try this game. There are clever ways of adding typed recursion back along a division between data and codata which ensures that you cannot write non-terminating functions. If you're interested, I suggest learning a bit of Coq.
But OCaml's goal (and Haskell's as well) is not to be delicate here. Both languages are uncompromisingly Turing Complete (and therefore "practical"). So let's discuss some more blunt ways of augmenting the STLC with recursion.
The bluntest of the bunch is to add a single built-in function called fix
val fix : ('a -> 'a) -> 'a
or, in more genuine OCaml-y notation which requires eta-expansion
val fix : (('a -> 'b) -> ('a -> 'b)) -> ('a -> 'b)
Now, remember that we're only considering a primitive STLC with fix added. We can indeed write fix (the latter one at least) in OCaml, but that's cheating at the moment. What does fix buy the STLC as a primitive?
It turns out that the answer is: "everything". STLC + Fix (basically a language called PCF) is impure and Turing Complete. It's also simply tremendously difficult to use.
So this is the final hurdle to jump: how do we make fix easier to work with? By adding recursive bindings!
Already, STLC has a let construction. You can think of it as just syntax sugar:
let x = E in F ----> (fun x -> F) (E)
but once we've added fix we also have the power to introduce let rec bindings
let rec x a = E in F ----> (fun x -> F) (fix (fun x a -> E))
At this point it should again be clear: let and let rec are very different beasts. They embody different levels of linguistic power and let rec is a window to allow fundamental impurity through Turing Completeness and its partner-effect non-termination.
So, at the end of the day, it's a little amusing that Haskell, the purer of the two languages, made the interesting choice of abolishing plain let bindings. That's really the only difference: there is no syntax for representing a non-recursive binding in Haskell.
At this point it's essentially just a style decision. The authors of Haskell determined that recursive bindings were so useful that one might as well assume that every binding is recursive (and mutually so, a can of worms ignored in this answer so far).
On the other hand, OCaml gives you to ability to be totally explicit about the kind of binding you choose, let or let rec!
I think this has nothing to do with being purely functional, it is just a design decision that in Haskell you are not allowed to do
let a = 0;;
let a = a + 1;;
whereas you can do it in Caml.
In Haskell this code won't work because let a = a + 1 is interpreted as a recursive definition and will not terminate.
In Haskell you don't have to specify that a definition is recursive simply because you can't create a non-recursive one (so the keyword rec is everywhere but is not written).
I am not an expert, but I'll make a guess until the truly knowledgable guys show up. In OCaml there can be side effects that happen during the definition of a function:
let rec f =
let () = Printf.printf "hello\n" in
fun x -> if x <= 0 then 12 else 1 + f (x - 1)
This means that the order of function definitions must be preserved in some sense. Now imagine that two distinct sets of mutually recursive functions are interleaved. It doesn't seem at all easy for the compiler to preserve the order while processing them as two separate mutually recursive sets of definitions.
The use of `let rec ... and`` means that distinct sets of mutually recursive function definitions can't be interleaved in OCaml as they can in Haskell. Haskell doesn't have side effects (in some sense), so definitions can be freely reordered.
It's not a question of purity, it's a question of specifying what environment the typechecker should check an expression in. It actually gives you more power than you would have otherwise. For example (I'm going to write Standard ML here because I know it better than OCaml, but I believe the typechecking process is pretty much the same for the two languages), it lets you distinguish between these cases:
val foo : int = 5
val foo = fn (x) => if x = foo then 0 else 1
Now as of the second redefinition, foo has the type int -> int. On the other hand,
val foo : int = 5
val rec foo = fn (x) => if x = foo then 0 else 1
does not typecheck, because the rec means that the typechecker has already decided that foo has been rebound to the type 'a -> int, and when it tries to figure out what that 'a needs to be, there is a unification failure because x = foo forces foo to have a numeric type, which it doesn't.
It can certainly "look" more imperative, because the case without rec allows you to do things like this:
val foo : int = 5
val foo = foo + 1
val foo = foo + 1
and now foo has the value 7. That's not because it's been mutated, however --- the name foo has been rebound 3 times, and it just so happens that each of those bindings shadowed a previous binding of a variable named foo. It's the same as this:
val foo : int = 5
val foo' = foo + 1
val foo'' = foo' + 1
except that foo and foo' are no longer available in the environment after the identifier foo has been rebound. The following are also legal:
val foo : int = 5
val foo : real = 5.0
which makes it clearer that what's happening is shadowing of the original definition, rather than a side effect.
Whether or not it's stylistically a good idea to rebind identifiers is questionable -- it can get confusing. It can be useful in some situations (e.g. rebinding a function name to a version of itself that prints debugging output).
I'd say that in OCaml they are trying to make REPL and source files work the same way. So, it's perfectly reasonable to redefine some function in REPL; therefore, they have to allow it in the source as well. Now, if you use the (redefined) function in itself, OCaml needs some way of knowing which of the definitions to use: the previous one or the new one.
In Haskell they've just gave up and accepted that REPL works differentyle from source files.

Does a function in Haskell always evaluate its return value?

I'm trying to better understand Haskell's laziness, such as when it evaluates an argument to a function.
From this source:
But when a call to const is evaluated (that’s the situation we are interested in, here, after all), its return value is evaluated too ... This is a good general principle: a function obviously is strict in its return value, because when a function application needs to be evaluated, it needs to evaluate, in the body of the function, what gets returned. Starting from there, you can know what must be evaluated by looking at what the return value depends on invariably. Your function will be strict in these arguments, and lazy in the others.
So a function in Haskell always evaluates its own return value? If I have:
foo :: Num a => [a] -> [a]
foo [] = []
foo (_:xs) = map (* 2) xs
head (foo [1..]) -- = 4
According to the above paragraph, map (* 2) xs, must be evaluated. Intuitively, I would think that means applying the map to the entire list- resulting in an infinite loop.
But, I can successfully take the head of the result. I know that : is lazy in Haskell, so does this mean that evaluating map (* 2) xs just means constructing something else that isn't fully evaluated yet?
What does it mean to evaluate a function applied to an infinite list? If the return value of a function is always evaluated when the function is evaluated, can a function ever actually return a thunk?
Edit:
bar x y = x
var = bar (product [1..]) 1
This code doesn't hang. When I create var, does it not evaluate its body? Or does it set bar to product [1..] and not evaluate that? If the latter, bar is not returning its body in WHNF, right, so did it really 'evaluate' x? How could bar be strict in x if it doesn't hang on computing product [1..]?
First of all, Haskell does not specify when evaluation happens so the question can only be given a definite answer for specific implementations.
The following is true for all non-parallel implementations that I know of, like ghc, hbc, nhc, hugs, etc (all G-machine based, btw).
BTW, something to remember is that when you hear "evaluate" for Haskell it normally means "evaluate to WHNF".
Unlike strict languages you have to distinguish between two "callers" of a function, the first is where the call occurs lexically, and the second is where the value is demanded. For a strict language these two always coincide, but not for a lazy language.
Let's take your example and complicate it a little:
foo [] = []
foo (_:xs) = map (* 2) xs
bar x = (foo [1..], x)
main = print (head (fst (bar 42)))
The foo function occurs in bar. Evaluating bar will return a pair, and the first component of the pair is a thunk corresponding to foo [1..]. So bar is what would be the caller in a strict language, but in the case of a lazy language it doesn't call foo at all, instead it just builds the closure.
Now, in the main function we actually need the value of head (fst (bar 42)) since we have to print it. So the head function will actually be called. The head function is defined by pattern matching, so it needs the value of the argument. So fst is called. It too is defined by pattern matching and needs its argument so bar is called, and bar will return a pair, and fst will evaluate and return its first component. And now finally foo is "called"; and by called I mean that the thunk is evaluated (entered as it's sometimes called in TIM terminology), because the value is needed. The only reason the actual code for foo is called is that we want a value. So foo had better return a value (i.e., a WHNF). The foo function will evaluate its argument and end up in the second branch. Here it will tail call into the code for map. The map function is defined by pattern match and it will evaluate its argument, which is a cons. So map will return the following {(*2) y} : {map (*2) ys}, where I have used {} to indicate a closure being built. So as you can see map just returns a cons cell with the head being a closure and the tail being a closure.
To understand the operational semantics of Haskell better I suggest you look at some paper describing how to translate Haskell to some abstract machine, like the G-machine.
I always found that the term "evaluate," which I had learned in other contexts (e.g., Scheme programming), always got me all confused when I tried to apply it to Haskell, and that I made a breakthrough when I started to think of Haskell in terms of forcing expressions instead of "evaluating" them. Some key differences:
"Evaluation," as I learned the term before, strongly connotes mapping expressions to values that are themselves not expressions. (One common technical term here is "denotations.")
In Haskell, the process of forcing is IMHO most easily understood as expression rewriting. You start with an expression, and you repeatedly rewrite it according to certain rules until you get an equivalent expression that satisfies a certain property.
In Haskell the "certain property" has the unfriendly name weak head normal form ("WHNF"), which really just means that the expression is either a nullary data constructor or an application of a data constructor.
Let's translate that to a very rough set of informal rules. To force an expression expr:
If expr is a nullary constructor or a constructor application, the result of forcing it is expr itself. (It's already in WHNF.)
If expr is a function application f arg, then the result of forcing it is obtained this way:
Find the definition of f.
Can you pattern match this definition against the expression arg? If not, then force arg and try again with the result of that.
Substitute the pattern match variables in the body of f with the parts of (the possibly rewritten) arg that correspond to them, and force the resulting expression.
One way of thinking of this is that when you force an expression, you're trying to rewrite it minimally to reduce it to an equivalent expression in WHNF.
Let's apply this to your example:
foo :: Num a => [a] -> [a]
foo [] = []
foo (_:xs) = map (* 2) xs
-- We want to force this expression:
head (foo [1..])
We will need definitions for head and `map:
head [] = undefined
head (x:_) = x
map _ [] = []
map f (x:xs) = f x : map f x
-- Not real code, but a rule we'll be using for forcing infinite ranges.
[n..] ==> n : [(n+1)..]
So now:
head (foo [1..]) ==> head (map (*2) [1..]) -- using the definition of foo
==> head (map (*2) (1 : [2..])) -- using the forcing rule for [n..]
==> head (1*2 : map (*2) [2..]) -- using the definition of map
==> 1*2 -- using the definition of head
==> 2 -- using the definition of *
I believe the idea must be that in a lazy language if you're evaluating a function application, it must be because you need the result of the application for something. So whatever reason caused the function application to be reduced in the first place is going to continue to need to reduce the returned result. If we didn't need the function's result we wouldn't be evaluating the call in the first place, the whole application would be left as a thunk.
A key point is that the standard "lazy evaluation" order is demand-driven. You only evaluate what you need. Evaluating more risks violating the language spec's definition of "non-strict semantics" and looping or failing for some programs that should be able to terminate; lazy evaluation has the interesting property that if any evaluation order can cause a particular program to terminate, so can lazy evaluation.1
But if we only evaluate what we need, what does "need" mean? Generally it means either
a pattern match needs to know what constructor a particular value is (e.g. I can't know what branch to take in your definition of foo without knowing whether the argument is [] or _:xs)
a primitive operation needs to know the entire value (e.g. the arithmetic circuits in the CPU can't add or compare thunks; I need to fully evaluate two Int values to call such operations)
the outer driver that executes the main IO action needs to know what the next thing to execute is
So say we've got this program:
foo :: Num a => [a] -> [a]
foo [] = []
foo (_:xs) = map (* 2) xs
main :: IO ()
main = print (head (foo [1..]))
To execute main, the IO driver has to evaluate the thunk print (head (foo [1..])) to work out that it's print applied to the thunk head (foo [1..]). print needs to evaluate its argument on order to print it, so now we need to evaluate that thunk.
head starts by pattern matching its argument, so now we need to evaluate foo [1..], but only to WHNF - just enough to tell whether the outermost list constructor is [] or :.
foo starts by pattern matching on its argument. So we need to evaluate [1..], also only to WHNF. That's basically 1 : [2..], which is enough to see which branch to take in foo.2
The : case of foo (with xs bound to the thunk [2..]) evaluates to the thunk map (*2) [2..].
So foo is evaluated, and didn't evaluate its body. However, we only did that because head was pattern matching to see if we had [] or x : _. We still don't know that, so we must immediately continue to evaluate the result of foo.
This is what the article means when it says functions are strict in their result. Given that a call to foo is evaluated at all, its result will also be evaluated (and so, anything needed to evaluate the result will also be evaluated).
But how far it needs to be evaluated depends on the calling context. head is only pattern matching on the result of foo, so it only needs a result to WHNF. We can get an infinite list to WHNF (we already did so, with 1 : [2..]), so we don't necessarily get in an infinite loop when evaluating a call to foo. But if head were some sort of primitive operation implemented outside of Haskell that needed to be passed a completely evaluated list, then we'd be evaluating foo [1..] completely, and thus would never finish in order to come back to head.
So, just to complete my example, we're evaluating map (2 *) [2..].
map pattern matches its second argument, so we need to evaluate [2..] as far as 2 : [3..]. That's enough for map to return the thunk (2 *) 2 : map (2 *) [3..], which is in WHNF. And so it's done, we can finally return to head.
head ((2 *) 2 : map (2 *) [3..]) doesn't need to inspect either side of the :, it just needs to know that there is one so it can return the left side. So it just returns the unevaluated thunk (2 *) 2.
Again though, we only evaluated the call to head this far because print needed to know what its result is, so although head doesn't evaluate its result, its result is always evaluated whenever the call to head is.
(2 *) 2 evaluates to 4, print converts that into the string "4" (via show), and the line gets printed to the output. That was the entire main IO action, so the program is done.
1 Implementations of Haskell, such as GHC, do not always use "standard lazy evaluation", and the language spec does not require it. If the compiler can prove that something will always be needed, or cannot loop/error, then it's safe to evaluate it even when lazy evaluation wouldn't (yet) do so. This can often be faster so GHC optimizations do actually do this.
2 I'm skipping over a few details here, like that print does have some non-primitive implementation we could step inside and lazily evaluate, and that [1..] could be further expanded to the functions that actually implement that syntax.
Not necessarily. Haskell is lazy, meaning that it only evaluates when it needs to. This has some interesting effects. If we take the below code, for example:
-- File: lazinessTest.hs
(>?) :: a -> b -> b
a >? b = b
main = (putStrLn "Something") >? (putStrLn "Something else")
This is the output of the program:
$ ./lazinessTest
Something else
This indicates that putStrLn "Something" is never evaluated. But it's still being passed to the function, in the form of a 'thunk'. These 'thunks' are unevaluated values that, rather than being concrete values, are like a breadcrumb-trail of how to compute the value. This is how Haskell laziness works.
In our case, two 'thunks' are passed to >?, but only one is passed out, meaning that only one is evaluated in the end. This also applies in const, where the second argument can be safely ignored, and therefore is never computed. As for map, GHC is smart enough to realise that we don't care about the end of the array, and only bothers to compute what it needs to, in your case the second element of the original list.
However, it's best to leave the thinking about laziness to the compiler and keep coding, unless you're dealing with IO, in which case you really, really should think about laziness, because you can easily go wrong, as I've just demonstrated.
There are lots and lots of online articles on the Haskell wiki to look at, if you want more detail.
Function could evaluate either return type:
head (x:_) = x
or exception/error:
head _ = error "Head: List is empty!"
or bottom (⊥)
a = a
b = last [1 ..]

SML conversions to Haskell

A few basic questions, for converting SML code to Haskell.
1) I am used to having local embedded expressions in SML code, for example test expressions, prints, etc. which functions local tests and output when the code is loaded (evaluated).
In Haskell it seems that the only way to get results (evaluation) is to add code in a module, and then go to main in another module and add something to invoke and print results.
Is this right? in GHCi I can type expressions and see the results, but can this be automated?
Having to go to the top level main for each test evaluation seems inconvenient to me - maybe just need to shift my paradigm for laziness.
2) in SML I can do pattern matching and unification on a returned result, e.g.
val myTag(x) = somefunct(a,b,c);
and get the value of x after a match.
Can I do something similar in Haskell easily, without writing separate extraction functions?
3) How do I do a constructor with a tuple argument, i.e. uncurried.
in SML:
datatype Thing = Info of Int * Int;
but in Haskell, I tried;
data Thing = Info ( Int Int)
which fails. ("Int is applied to too many arguments in the type:A few Int Int")
The curried version works fine,
data Thing = Info Int Int
but I wanted un-curried.
Thanks.
This question is a bit unclear -- you're asking how to evaluate functions in Haskell?
If it is about inserting debug and tracing into pure code, this is typically only needed for debugging. To do this in Haskell, you can use Debug.Trace.trace, in the base package.
If you're concerned about calling functions, Haskell programs evaluate from main downwards, in dependency order. In GHCi you can, however, import modules and call any top-level function you wish.
You can return the original argument to a function, if you wish, by making it part of the function's result, e.g. with a tuple:
f x = (x, y)
where y = g a b c
Or do you mean to return either one value or another? Then using a tagged union (sum-type), such as Either:
f x = if x > 0 then Left x
else Right (g a b c)
How do I do a constructor with a tuple argument, i.e. uncurried in SML
Using the (,) constructor. E.g.
data T = T (Int, Int)
though more Haskell-like would be:
data T = T Int Bool
and those should probably be strict fields in practice:
data T = T !Int !Bool
Debug.Trace allows you to print debug messages inline. However, since these functions use unsafePerformIO, they might behave in unexpected ways compared to a call-by-value language like SML.
I think the # syntax is what you're looking for here:
data MyTag = MyTag Int Bool String
someFunct :: MyTag -> (MyTag, Int, Bool, String)
someFunct x#(MyTag a b c) = (x, a, b, c) -- x is bound to the entire argument
In Haskell, tuple types are separated by commas, e.g., (t1, t2), so what you want is:
data Thing = Info (Int, Int)
Reading the other answers, I think I can provide a few more example and one recommendation.
data ThreeConstructors = MyTag Int | YourTag (String,Double) | HerTag [Bool]
someFunct :: Char -> Char -> Char -> ThreeConstructors
MyTag x = someFunct 'a' 'b' 'c'
This is like the "let MyTag x = someFunct a b c" examples, but it is a the top level of the module.
As you have noticed, Haskell's top level can defined commands but there is no way to automatically run any code merely because your module has been imported by another module. This is entirely different from Scheme or SML. In Scheme the file is interpreted as being executed form-by-form, but Haskell's top level is only declarations. Thus Libraries cannot do normal things like run initialization code when loaded, they have to provide a "pleaseRunMe :: IO ()" kind of command to do any initialization.
As you point out this means running all the tests requires some boilerplate code to list them all. You can look under hackage's Testing group for libraries to help, such as test-framework-th.
For #2, yes, Haskell's pattern matching does the same thing. Both let and where do pattern matching. You can do
let MyTag x = someFunct a b c
in ...
or
...
where MyTag x = someFunct a b c

Resources