What is the order in which Haskell guards are evaluated?
Say that I have a function which returns a Bool:
someFunc :: Bool -> Bool -> Bool
someFunc b1 b2
| b1 == True && b2 == True = True
| b1 == True = False
| b1 == False .....
...
I think it was with Monads and the Do-notation that I read, that actions are sometimes not evaluated sequentially. That is if I have:
do { val1 <- action1
val2 <- action2
action3 }
It might be the case that val2 will be calculated before val1.
Is this the case for guards as well? Can they be evaluated out of order?
If guards were sequential, then if the first statement evaluates to False, and the second evaluates to True, then I can conclude that b2 is False. Does this logic always hold?
Edit: By statements I mean guard 1 to 3
Evaluating the tests within guards can’t have any side-effects — unlike in procedural languages. So the order of evaluating the comparisons or Boolean connectives doesn’t make any difference to the semantics of the program.
Prioritising the branches — that is, each of the lines starting | — is from top to bottom. But really ‘evaluating’ is the wrong concept: it would be OK for the compiler to first evaluate your b1 == False, providing it didn’t take the third branch until it had checked the first two. (GHC doesn’t actually do that; I’m just setting up a straw man.)
Note that in a call to someFunc, the arguments for b1, b2 might be arbitrarily complex expressions. Haskell’s ‘Lazy semantics’ mean that neither of them are evaluated until needed.
Does this logic always hold?
Be careful: if an early guard turns out False, you can’t assume anything about the expressions in it. The compiler might have rearranged them for efficiency, evaluated one out of textual order, then moved on. In your example, if for the first branch it turned out b1 /= True, the compiler might not evaluate b2 at all. So you can’t conclude anything about b2. Indeed b2 might give bottom/infinite calculation, if evaluated.
It’s not just with Monads or Do-notation (which are the same thing) that expressions are not necessarily evaluated in textual order — that’s true across any expressions in any context. (The IO Monad has some dodgy semantics, to make it seem ‘statements’ are executed top-to-bottom.)
Related
Haskell is sometimes said to "replace equals for equals". The following code shows this isn't true under every interpretation of such a sentence. Wikipedia follows that by saying f(x)=f(x) for every x but that doesn't seem to carry any actual logical content one can test, it would be true by the reflexive law, a tautology.
I think the phrasing needed to make a logical claim like this is more like Leibniz' law (or indistinguishable identicals) where
x=y implies for every f, f(x)=f(y). That claim fails in the illustration below within Haskell. (We override == to make a partition type, but our function definition can freely ignore this and do.)
My question is, can one actually state referential transparency in a way that can be logically tested, and does Haskell actually uphold that logical claim?
module Main (main) where
data Floop = One | Two | Three
instance Eq Floop where
One == One = True
One == Two = False
One == Three = False
Two == One = False
Two == Two = True
Two == Three = True --- 2=3
Three == One = False
Three == Two = True --- 3=2
Three == Three = True
shuffle :: Floop -> Floop
shuffle One = Two
shuffle Two = Two --- fix 2
shuffle Three = One --- move 3
main = print ( (Two == Three) && (shuffle Two /= shuffle Three) )
--- prints "True" proving Haskell violates Leibniz Law
Expanding slightly on what I already said in my comment (thanks #FyodorSolkin for the prod):
You haven't violated referential transparency there, you've just made a pathological Eq instance.
While, as you've observed, the language doesn't forbid you from doing this, nor does it forbid one from making unlawful Functor or Monad instances. (Because it would be totally unfeasible to try to check these laws in practice.) But just because something doesn't cause a compiler error doesn't necessarily mean it's the right thing to do.
So the problem with your example is that while, semantically, (==) in Haskell indeed means "equal", it's just a function, in fact a method of a typeclass - which you can therefore implement however you want. Nothing stops me from defining, for example:
instance (Eq) (a -> b) where
_ == _ = True
and suddenly all functions will be considered "equal" under this definition. Clearly referential transparency will be violated if we consider this to be a true definition of equality. My point is that it's not. In fact it's quite obvious what "equality" means for any type which isn't either a function or otherwise depends on or "contains" function types. (It's actually obvious what equality of functions should mean too, it's just impossible for there to be a general algorithm to determine if two arbitrary functions are equal.)
[EDIT: I just remembered it also doesn't make much sense to talk about equality of IO actions. There might be some other abstract types like that where there's no clear definition of what equality would mean.]
To stray into abstract mathematics for a minute: your Eq instance certainly defines an equivalence relation, which is considered to be a sort of "generalised equality" - and indeed is equality if you use the relation to make equivalence classes. But then it's nonsense to try to apply a function to such a domain/type which differs on different elements of the same equivalence class. Such a thing - as in your example - actually fundamentally fails to be a well-defined mathematical function, because you're defining it on the individual elements in a way which fails to respect the equivalence relation.
f(x)=f(x) for every x
is by no means a tautology. In many popular languages, this property does not hold. Consider Java, for instance:
import java.util.*;
public class Transparency {
static int f(List<Object> xs) {
xs.add(xs.size());
return xs.size();
}
public static void main(String[] args) {
List<Object> x = new ArrayList<>();
System.out.println("Is java referentially transparent? " + (f(x) == f(x)));
}
}
$ javac Transparency.java
$ java Transparency
Is java referentially transparent? false
Here, because f mutates its input x, it would change behavior if we substitute x's definition into f(x) == f(x): f(new ArrayList<>()) == f(new ArrayList<>()) is in fact true, but when using a variable to reduce duplication it evaluates to false. In Haskell, such a substitution is always valid (disregarding cheats like unsafePerformIO).
In Haskell, afaik, there are no statements, just expressions. That is, unlike in an imperative language like Javascript, you cannot simply execute code line after line, i.e.
let a = 1
let b = 2
let c = a + b
print(c)
Instead, everything is an expression and nothing can simply modify state and return nothing (i.e. a statement). On top of that, everything would be wrapped in a function such that, in order to mimic such an action as above, you'd use the monadic do syntax and thereby hide the underlying nested functions.
Is this the same in OCAML/F# or can you just have imperative statements?
This is a bit of a complicated topic. Technically, in ML-style languages, everything is an expression. However, there is some syntactic sugar to make it read more like statements. For example, the sample you gave in F# would be:
let a = 1
let b = 2
let c = a + b
printfn "%d" c
However, the compiler silently turns those "statements" into the following expression for you:
let a = 1 in
let b = 2 in
let c = a + b in
printfn "%d" c
Now, the last line here is going to do IO, and unlike in Haskell, it won't change the type of the expression to IO. The type of the expression here is unit. unit is the F# way of expressing "this function doesn't really have result" in the type system. Of course, if the function doesn't have a result, in a purely functional language it would be pointless to call it. The only reason to call it would be for some side-effect, and since Haskell doesn't allow side-effects, they use the IO monad to encode the fact the function has an IO producing side-effect into the type system.
F# and other ML-based languages do allow side-effects like IO, so they have the unit type to represent functions that only do side-effects, like printing. When designing your application, you will generally want to avoid having unit-returning functions except for things like logging or printing. If you feel so inclined, you can even use F#'s moand-ish feature, Computation Expressions, to encapsulate your side-effects for you.
Not to be picky, but there's no language OCaml/F# :-)
To answer for OCaml: OCaml is not a pure functional language. It supports side effects directly through mutability, I/O, and exceptions. In many cases it treats such constructs as expressions with the value (), the single value of type unit.
Expressions of type unit can appear in a sequence separated by ;:
let s = ref 0 in
while !s < 10 do
Printf.printf "%d\n" !s; (* This has type unit *)
incr s (* This has type unit *)
done (* The while as a whole has type unit *)
Update
More specifically, ; ignores the value of the first expression and returns the value of the second expression. The first expression should have type unit but this isn't absolutely required.
# print_endline "hello"; 44 ;;
hello
- : int = 44
# 43 ; 44 ;;
Warning 10: this expression should have type unit.
- : int = 44
The ; operator is right associative, so you can write a ;-separated sequence of expressions without extra parentheses. It has the value of the last (rightmost) expression.
To answer the question we need to define what is an expression and what is a statement.
Distinction between expressions and statements
In layman terms, an expression is something that evaluates (reduces) to a value. It is basically something, that may occur on the right-hand side of the assignment operator. Contrary, a statement is some directive that doesn't produce directly a value.
For example, in Python, the ternary operator builds expressions, e.g.,
'odd' if x % 2 else 'even'
is an expression, so you can assign it to a variable, print, etc
While the following is a statement:
if x % 2:
'odd'
else:
'even'
It is not reduced to a value by Python, it couldn't be printed, assigned to a value, etc.
So far we were focusing more on the semantical differences between expressions and statements. But for a casual user, they are more noticeable on the syntactic level. I.e., there are places where a statement is expected and places where expressions are expected. For example, you can put a statement to the right of the assignment operator.
OCaml/Reason/Haskell/F# story
In OCaml, Reason, and F# such constructs as if, while, print etc are expressions. They all evaluate to values and can occur on the right-hand side of the assignment operator. So it looks like that there is no distinction between statements and expressions. Indeed, there are no statements in OCaml grammar at all. I believe, that F# and Reason are also not using word statement to exclude confusion. However, there are syntactic forms that are not expressions, for example:
open Core_kernel
it is not an expression, definitely, and
type students = student list
is not an expression.
So what is that? In the OCaml parlance, they are called definitions, and they are syntactic constructs that can appear in the module on the, so called, top-level. For example, in OCaml, there are value definitions, that look like this
let harry = student "Harry"
let larry = student "Larry"
let group = [harry; larry]
Every line above is a definition. And every line contains an expression on the right-hand side of the = symbol. In OCaml there is also a let expression, that has form let <v> = <exp> in <exp> that should not be confused with the top-level let definition.
Roughly the same is true for F# and Reason. It is also true for Haskell, that has a distinction between expressions and declarations. It actually should be true to probably every real-world language (i.e., excluding brainfuck and other toy languages).
Summary
So, all these languages have syntactic forms that are not expressions. They are not called statements per se, but we can treat them as statements. So there is a distinction between statements and expressions. The main difference from common imperative languages is that some well-known statements (e.g., if, while, for) are expressions in OCaml/F#/Reason/Haskell, and this is why people commonly say that there is no distinction between expressions and statements.
From wiki.haskell.org:
First of all, common subexpression elimination (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:
foo x = (bar x) * (bar x)
might be transformed into
foo x = let x' = bar x in x' * x'
thus, the bar function is only called once. (And if bar is a particularly expensive function, this might save quite a lot of work.)
GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/laziness of the program. So GHC does do CSE, but only in specific circumstances --- see the GHC manual. (Section??)
Long story short: "If you care about CSE, do it by hand."
I'm wondering under what circumstances CSE "affects" the strictness/laziness of the program and what kind of effect that could be.
The naive CSE rule would be
e'[e, e] ~> let x = e in e'[x, x].
That is, whenever a subexpression e occurs twice in the expression e', we use a let-binding to compute e once. This however leads itself to some trivial space leaks. For example
sum [1..n] + prod [1..n]
is typically O(1) space usage in a lazy functional programming language like Haskell (as sum and prod would tail-recurse and blah blah blah), but would become O(n) when the naive CSE rule is enacted. This can be terrible for programs when n is high!
The approach is then to make this rule more specific, restricting it to a small set of cases that we know won't have the problem. We can begin by more specifically enumerating the problems with the naive rule, which will form a set of priorities for us to develop a better CSE:
The two occurrences of e might be far apart in e', leading to a long lifetime for the let x = e binding.
The let-binding must always allocate a closure where previously there might not have been one.
This can create an unbound number of closures.
There are cases where the closure might never deallocate.
Something better
let x = e in e'[e] ~> let x = e in e'[x]
This is a more conservative rule but is much safer. Here we recognize that e appears twice but the first occurrence syntactically dominates the second expression, meaning here that the programmer has already introduced a let-binding. We can safely just reuse that let-binding and replace the second occurrence of e with x. No new closures are allocated.
Another example of syntactic domination:
case e of { x -> e'[e] } ~> case e of { x -> e'[x] }
And yet another:
case e of {
Constructor x0 x1 ... xn ->
e'[e]
}
~>
case e of {
Constructor x0 x1 ... xn ->
e'[Constructor x0 x1 ... xn]
}
These rules all take advantage of existing structure in the program to ensure that the kinetics of space usage remain the same before and after the transformation. They are much more conservative than the original CSE but they are also much safer.
See also
For a full discussion of CSE in a lazy FPL, read Chitil's (very accessible) 1997 paper. For a full treatment of how CSE works in a production compiler, see GHC's CSE.hs module, which is documented very thoroughly thanks to GHC's tradition of writing long footnotes. The comment-to-code ratio in that module is off the charts. Also note how old that file is (1993)!
I want to define a function that have the following properties
symmetricLazyOr :: Bool -> Bool -> Bool
symmetricLazyOr True _|_ === True
symmetricLazyOr _|_ True === True
And otherwise it works like the normal or.
Is it even possible in Haskell?
UPDATE
This question is focus on semantic rather than implementation detail. Intuitively, or shall be symmetric, which means or a b === or b a for all given a and b. However, this is not true in Haskell since or _|_ True === _|_ whilst or True _|_ === True.
In other words, you're looking for a function that, given two arguments, attempts to evaluate them both and is true if either argument is true? And in particular, a True result will be returned so long as at least one argument is True and not bottom?
Assuming that's correct, this is possible, but not purely. In order to implement it, you need to race two threads to evaluate each of the branches. The unamb package has some functions for dealing with cases like this (including the parallel-or function por). Another option is lvish, which should also work in this case as I understand it.
I'm pretty new to haskell, but if you make an if statement:
function a b c
| (a+b == 0) = True
| --etc.
| otherwise = False
Is the second if statement the same as an else if in other languages, or is it just another if. I assume its the former as you can only have one output, but I just want to make sure.
The construct you used is called a guard. Haskell checks the given alternatives one after another until one condition yields True. It then evaluates the right hand side of that equation.
You could pretty well write
function n
| n == 1 = ...
| n == 2 = ...
| n >= 3 = ...
thus the guard kinds of represents an if/elseif construct from other languages. As otherwise is simply defined as True, the last
| otherwise =
will always be true and therefore represents a catch-all else clause.
Nontheless, Haskell has a usual a = if foo then 23 else 42 statement.
What you have here is not really an if statement, but rather a guard. But you are right that the second case gets "executed" only if the previous cases (by cases here I mean the expressions between the | and =) did not match (evaluate to True). otherwise is just a synonyme to True (that way it always "matches").
It must be like an else if.
The bottom pattern otherwise is really just True, so if the first match didn't win, you would always get the more specific value and the otherwise value.
Correct. Though you've used guards, the way you've expressed it is more or less identical to using an if-statement. The flow of testing the conditional to yield a result will fall through the guard you've written in the order they were listed in your guard.
(a+b == 0)
Will be checked first
etc.
Will be checked second and so forth, provided no preceding conditional is true.
otherwise
Will be checked last, provided no preceding conditional is true.