I should probably first mention that I'm pretty new to Haskell. Is there a particular reason to keep the let expression in Haskell?
I know that Haskell got rid of the rec keyword that corresponds to the Y-combinator portion of a let statement that indicates it's recursive. Why didn't they get rid of the let statement altogether?
If they did, statements will seem more iterative to some degree. For example, something like:
let y = 1+2
z = 4+6
in y+z
would just be:
y = 1+2
z = 4+6
y+z
Which is more readable and easier for someone new to functional programming to follow. The only reason I can think of to keep it around is something like this:
aaa = let y = 1+2
z = 4+6
in y+z
Which would look this this without the let, which I think ends up being ambiguous grammar:
aaa =
y = 1+2
z = 4+6
y+z
But if Haskell didn't ignore whitespace, and code blocks/scope worked similar to Python, would it be able to remove the let?
Is there a stronger reason to keep around let?
Sorry if this question seems stupid, I'm just trying to understand more about why it's in there.
Syntactically you can easily imagine a language without let. Immediately, we can produce this in Haskell by simply relying on where if we wanted. Beyond that are many possible syntaxes.
Semantically, you might think that let could translate away to something like this
let x = e in g ==> (\x -> g) e
and, indeed, at runtime these two expressions are identical (modulo recursive bindings, but those can be achieved with fix). Traditionally, however, let has special typing semantics (along with where and top-level name definitions... all of which being, effectively, syntax sugar for let).
In particular, in the Hindley-Milner type system which forms the foundation of Haskell there's a notion of let-generalization. Intuitively, it regards situations where we upgrade functions to their most polymorphic form. In particular, if we have a function appearing in an expression somewhere with a type like
a -> b -> c
those variables, a, b, and c, may or may not already have meaning in that expression. In particular, they're assumed to be fixed yet unknown types. Compare that to the type
forall a b c. a -> b -> c
which includes the notion of polymorphism by stating, immediately, that even if there happen to be type variables a, b, and c available in the envionment, these references are fresh.
This is an incredibly important step in the HM inference algorithm as it is how polymorphism is generated allowing HM to reach its more general types. Unfortunately, it's not possible to do this step whenever we please—it must be done at controlled points.
This is what let-generalization does: it says that types should be generalized to polymorphic types when they are let-bound to a particular name. Such generalization does not occur when they are merely passed into functions as arguments.
So, ultimately, you need a form of "let" in order to run the HM inference algorithm. Further, it cannot just be syntax sugar for function application despite them having equivalent runtime characteristics.
Syntactically, this "let" notion might be called let or where or by a convention of top-level name binding (all three are available in Haskell). So long as it exists and is a primary method for generating bound names where people expect polymorphism then it'll have the right behavior.
There are important reasons why Haskell and other functional languages use let. I'll try to describe them step by step:
Quantification of type variables
The Damas-Hindley-Milner type system used in Haskell and other functional languages allows polymorphic types, but the type quantifiers are allowed only in front of a given type expression. For example, if we write
const :: a -> b -> a
const x y = x
then the type of const is polymorphic, it is implicitly universally quantified as
∀a.∀b. a -> b -> a
and const can be specialized to any type that we obtain by substituting two type expressions for a and b.
However, the type system doesn't allow quantifiers inside type expressions, such as
(∀a. a -> a) -> (∀b. b -> b)
Such types are allowed in System F, but then type checking and type inference is undecidable, which means that the compiler wouldn't be able to infer types for us and we would have to explicitly annotate expressions with types.
(For long time the question of decidability of type-checking in System F had been open, and it had been sometimes addressed as "an embarrassing open problem", because the undecidability had been proven for many other systems but this one, until proved by Joe Wells in 1994.)
(GHC allows you to enable such explicit inner quantifiers using the RankNTypes extension, but as mentioned, the types can't be inferred automatically.)
Types of lambda abstractions
Consider the expression λx.M, or in Haskell notation \x -> M,
where M is some term containing x. If the type of x is a and the type of M is b, then the type of the whole expression will be λx.M : a → b. Because of the above restriction, a must not contain ∀, therefore the type of x can't contain type quantifiers, it can't be polymorphic (or in other words it must be monomorphic).
Why lambda abstraction isn't enough
Consider this simple Haskell program:
i :: a -> a
i x = x
foo :: a -> a
foo = i i
Let's disregard for now that foo isn't very useful. The main point is that id in the definition of foo is instantiated with two different types. The first one
i :: (a -> a) -> (a -> a)
and the second one
i :: a -> a
Now if we try to convert this program into the pure lambda calculus syntax without let, we'd end up with
(λi.i i)(λx.x)
where the first part is the definition of foo and the second part is the definition of i. But this term will not type check. The problem is that i must have a monomorphic type (as described above), but we need it polymorphic so that we can instantiate i to the two different types.
Indeed, if you try to typecheck i -> i i in Haskell, it will fail. There is no monomorphic type we can assign to i so that i i would typecheck.
let solves the problem
If we write let i x = x in i i, the situation is different. Unlike in the previous paragraph, there is no lambda here, there is no self-contained expression like λi.i i, where we'd need a polymorphic type for the abstracted variable i. Therefore let can allow i to have a polymorhpic type, in this case ∀a.a → a and so i i typechecks.
Without let, if we compiled a Haskell program and converted it to a single lambda term, every function would have to be assigned a single monomorphic type! This would be pretty useless.
So let is an essential construction that allows polymorhism in languages based on Damas-Hindley-Milner type systems.
The History of Haskell speaks a bit to the fact that Haskell has long since embraced a complex surface syntax.
It took some while to identify the stylistic choice as we have done here, but once we had done so, we engaged in furious debate about which style was “better.” An underlying assumption was that if possible there should be “just one way to do something,” so that, for example, having both let and where would be redundant and confusing.
In the end, we abandoned the underlying assumption, and provided full syntactic support for both styles. This may seem like a classic committee decision, but it is one that the present authors believe was a fine choice, and that we now regard as a strength of the language. Different constructs have different nuances, and real programmers do in practice employ both let and where, both guards and conditionals, both pattern-matching definitions and case expressions—not only in the same program but sometimes in the same function definition. It is certainly true that the additional syntactic sugar makes the language seem more elaborate, but it is a superficial sort of complexity, easily explained by purely syntactic transformations.
This is not a stupid question. It is completely reasonable.
First, let/in bindings are syntactically unambiguous and can be rewritten in a simple mechanical way into lambdas.
Second, and because of this, let ... in ... is an expression: that is, it can be written wherever expressions are allowed. In contrast, your suggested syntax is more similar to where, which is bound to a surrounding syntactic construct, like the pattern matching line of a function definition.
One might also make an argument that your suggested syntax is too imperative in style, but this is certainly subjective.
You might prefer using where to let. Many Haskell developers do. It's a reasonable choice.
There is a good reason why let is there:
let can be used within the do notation.
It can be used within list comprehension.
It can be used within function definition as mentioned here conveniently.
You give the following example as an alternative to let :
y = 1+2
z = 4+6
y+z
The above example will not typecheck and the y and z will also lead to the pollution of global namespace which can be avoided using let.
Part of the reason Haskell's let looks like it does is also the consistent way it manages its indentation sensitivity. Every indentation-sensitive construct works the same way: first there's an introducing keyword (let, where, do, of); then the next token's position determines what is the indentation level for this block; and subsequent lines that start at the same level are considered to be a new element in the block. That's why you can have
let a = 1
b = 2
in a + b
or
let
a = 1
b = 2
in a + b
but not
let a = 1
b = 2
in a + b
I think it might actually be possible to have keywordless indentation-based bindings without making the syntax technically ambiguous. But I think there is value in the current consistency, at least for the principle of least surprise. Once you see how one indentation-sensitive construct works, they all work the same. And as a bonus, they all have the same indentation-insensitive equivalent. This
keyword <element 1>
<element 2>
<element 3>
is always equivalent to
keyword { <element 1>; <element 2>; <element 3> }
In fact, as a mainly F# developer, this is something I envy from Haskell: F#'s indentation rules are more complex and not always consistent.
Related
Why do we need Control.Lens.Reified? Is there some reason I can't place a Lens directly into a container? What does reify mean anyway?
We need reified lenses because Haskell's type system is predicative. I don't know the technical details of exactly what that means, but it prohibits types like
[Lens s t a b]
For some purposes, it's acceptable to use
Functor f => [(a -> f b) -> s -> f t]
instead, but when you reach into that, you don't get a Lens; you get a LensLike specialized to some functor or another. The ReifiedBlah newtypes let you hang on to the full polymorphism.
Operationally, [ReifiedLens s t a b] is a list of functions each of which takes a Functor f dictionary, while forall f . Functor f => [LensLike f s t a b] is a function that takes a Functor f dictionary and returns a list.
As for what "reify" means, well, the dictionary will say something, and that seems to translate into a rather stunning variety of specific meanings in Haskell. So no comment on that.
The problem is that, in Haskell, type abstraction and application are completely implicit; the compiler is supposed to insert them where needed. Various attempts at designing 'impredicative' extensions, where the compiler would make clever guesses where to put them, have failed; so the safest thing ends up being relying on the Haskell 98 rules:
Type abstractions occur only at the top level of a function definition.
Type applications occur immediately whenever a variable with a polymorphic type is used in an expression.
So if I define a simple lens:[1]
lensHead f [] = pure []
lensHead f (x:xn) = (:xn) <$> f x
and use it in an expression:
[lensHead]
lensHead gets automatically applied to some set of type parameters; at which point it's no longer a lens, because it's not polymorphic in the functor anymore. The take-away is: an expression always has some monomorphic type; so it's not a lens. (You'll note that the lens functions take arguments of type Getter and Setter, which are monomorphic types, for similar reasons to this. But a [Getter s a] isn't a list of lenses, because they've been specialized to only getters.)
What does reify mean? The dictionary definition is 'make real'. 'Reifying' is used in philosophy to refer to the act of regarding or treating something as real (rather than ideal or abstract). In programming, it tends to refer to taking something that normally can't be treated as a data structure and representing it as one. For example, in really old Lisps, there didn't use to be first-class functions; instead, you had to use S-Expressions to pass 'functions' around, and eval them when you needed to call the function. The S-Expressions represented the functions in a way you could manipulate in the program, which is referred to as reification.
In Haskell, we don't typically need such elaborate reification strategies as Lisp S-Expressions, partly because the language is designed to avoid needing them; but since
newtype ReifiedLens s t a b = ReifiedLens (Lens s t a b)
has the same effect of taking a polymorphic value and turning it into a true first-class value, it's referred to as reification.
Why does this work, if expressions always have monomorphic types? Well, because the Rank2Types extension adds a third rule:
Type abstractions occur at the top-level of the arguments to certain functions, with so-called rank 2 types.
ReifiedLens is such a rank-2 function; so when you say
ReifiedLens l
you get a type lambda around the argument to ReifiedLens, and then l is applied immediately to the the lambda-bound type argument. So l is effectively just eta-expanded. (Compilers are free to eta-reduce this and just use l directly).
Then, when you say
f (ReifiedLens l) = ...
on the right-hand side, l is a variable with polymorphic type, so every use of l is immediately implicitly assigned to whatever type arguments are needed for the expression to type-check. So everything works the way you expect.
The other way to think about is that, if you say
newtype ReifiedLens s t a b = ReifiedLens { unReify :: Lens s t a b }
the two functions ReifiedLens and unReify act like explicit type abstraction and application operators; this allows the compiler to identify where you want the abstractions and applications to take place well enough that the issues with impredicative type systems don't come up.
[1] In lens terminology, this is apparently called something other than a 'lens'; my entire knowledge of lenses comes from SPJ's presentation on them so I have no way to verify that. The point remains, since the polymorphism is still necessary to make it work as both a getter and a setter.
Suppose I was to define (+) on Strings but not by giving an instance of Num String.
Why does Haskell now hide Nums (+) function? After all, the function I have provided:
(+) :: String -> String -> String
can be distinguished by the compiler from Prelude's (+). Why can't both functions exist in the same namespace, but with different, non-overlapping type signatures?
As long as there is no call to the function in the code, Haskell to care that there's an ambiguitiy. Placing a call to the function with arguments will then determine the types, such that appropriate implementation can be chosen.
Of course, once there is an instance Num String, there would actually be a conflict, because at that point Haskell couldn't decide based upon the parameter type which implementation to choose, if the function were actually called.
In that case, an error should be raised.
Wouldn't this allow function overloading without pitfalls/ambiguities?
Note: I am not talking about dynamic binding.
Haskell simply does not support function overloading (except via typeclasses). One reason for that is that function overloading doesn't work well with type inference. If you had code like f x y = x + y, how would Haskell know whether x and y are Nums or Strings, i.e. whether the type of f should be f :: Num a => a -> a -> a or f :: String -> String -> String?
PS: This isn't really relevant to your question, but the types aren't strictly non-overlapping if you assume an open world, i.e. in some module somewhere there might be an instance for Num String, which, when imported, would break your code. So Haskell never makes any decisions based on the fact that a given type does not have an instance for a given typeclass. Of course, function definitions hide other function definitions with the same name even if there are no typeclasses involved, so as I said: not really relevant to your question.
Regarding why it's necessary for a function's type to be known at the definition site as opposed to being inferred at the call-site: First of all the call-site of a function may be in a different module than the function definition (or in multiple different modules), so if we had to look at the call site to infer a function's type, we'd have to perform type checking across module boundaries. That is when type checking a module, we'd also have to go all through the modules that import this module, so in the worst case we have to recompile all modules every time we change a single module. This would greatly complicate and slow down the compilation process. More importantly it would make it impossible to compile libraries because it's the nature of libraries that their functions will be used by other code bases that the compiler does not have access to when compiling the library.
As long as the function isn't called
At some point, when using the function
no no no. In Haskell you don't think of "before" or "the minute you do...", but define stuff once and for all time. That's most apparent in the runtime behaviour of variables, but also translates to function signatures and class instances. This way, you don't have to do all the tedious thinking about compilation order and are safe from the many ways e.g. C++ templates/overloads often break horribly because of one tiny change in the program.
Also, I don't think you quite understand how Hindley-Milner works.
Before you call the function, at which time you know the type of the argument, it doesn't need to know.
Well, you normally don't know the type of the argument! It may sometimes be explicitly given, but usually it's deduced from the other argument or the return type. For instance, in
map (+3) [5,6,7]
the compiler doesn't know what types the numeric literals have, it only knows that they are numbers. This way, you can evaluate the result as whatever you like, and that allows for things you could only dream of in other languages, for instance a symbolic type where
> map (+3) [5,6,7] :: SymbolicNum
[SymbolicPlus 5 3, SymbolicPlus 6 3, SymbolicPlus 7 3]
Does it help the compiler to optimise, or is it just surplus work to add additional type signatures? For example, one often sees:
foo :: a -> b
foo x = bar x
where bar x = undefined
Rather than:
foo :: a -> b
foo x = bar x
where bar :: a -> b
bar x = undefined
If I omit the top level type signature, GHC gives me a warning, so if I don't get warnings I am quite confident my program is correct. But no warnings are issued if I omit the signature in a where clause.
There exists a class of local functions whose types cannot be written in Haskell (without using fancy GHC extensions, that is). For example:
f :: a -> (a, Int)
f h = g 1
where g n = (h, n)
This is because while the a in the f type signature is polymorphic viewed from outside f, this is not so from within f. In g, it is just some unknown type, but not any type, and (standard) Haskell cannot express "the same type as the first argument of the function this one is defined in" in its type language.
Often definitions in where clauses are to avoid repeating yourself if a sub-expression occurs more than once in a definition. In such a case, the programmer thinks of the local definition as a simple stand-in for writing out the inline sub-expressions. You usually wouldn't explicitly type the inline sub-expressions, so you don't type the where definition either. If you're doing it to save on typing, then the type declaration would kill all your savings.
It seems quite common to introduce where to learners of Haskell with examples of that form, so they go on thinking that "normal style" is to not give type declarations for local definitions. At least, that was my experience learning Haskell. I've since found that many of my functions that are complicated enough to need a where block become rather inscrutable if I don't know the type of the local definitions, so I try to err towards always typing them now; even if I think the type is obvious while I'm writing the code, it may not be so obvious when I'm reading it after not having looked at it for a while. A little effort for my fingers is almost always outweighed by even one or two instances of having to run type inference in my head!
Ingo's answer gives a good reason for deliberately not giving a type to a local definition, but I suspect the main reason is that many programmers have assimilated the rule of thumb that type declarations be provided for top level definitions but not for local definitions from the way they learned Haskell.
Often where declarations are used for short, local things, which have simple types, or types that are easily inferred. As a result there's no benefit to the human or compiler to add the type.
If the type is complex, or cannot be inferred, then you might want to add the type.
While giving monomorphic type signatures can make top level functions faster, it isn't so much of a win for local definitions in where clauses, since GHC will inline and optimize away the definitions in most cases anyway.
Adding a type signature can make your code faster. Take for example the following program (Fibonacci):
result = fib 25 ;
-- fib :: Int -> Int
fib x = if x<2 then 1 else (fib (x-1)) + (fib (x-2))
Without the annotation in the 2nd line, it takes 0.010 sec. to run.
With the Int -> Int annotation, it takes 0.002 sec.
This happens because if you don't say anything about fib, it is going to be typed as fib :: (Num a, Num a1, Ord a) => a -> a1, which means that during runtime, extra data structures ("dictionaries") will have to be passed between functions to represent the Num/Ord typeclasses.
What it says in the title. If I write a type signature, is it possible to algorithmically generate an expression which has that type signature?
It seems plausible that it might be possible to do this. We already know that if the type is a special-case of a library function's type signature, Hoogle can find that function algorithmically. On the other hand, many simple problems relating to general expressions are actually unsolvable (e.g., it is impossible to know if two functions do the same thing), so it's hardly implausible that this is one of them.
It's probably bad form to ask several questions all at once, but I'd like to know:
Can it be done?
If so, how?
If not, are there any restricted situations where it becomes possible?
It's quite possible for two distinct expressions to have the same type signature. Can you compute all of them? Or even some of them?
Does anybody have working code which does this stuff for real?
Djinn does this for a restricted subset of Haskell types, corresponding to a first-order logic. It can't manage recursive types or types that require recursion to implement, though; so, for instance, it can't write a term of type (a -> a) -> a (the type of fix), which corresponds to the proposition "if a implies a, then a", which is clearly false; you can use it to prove anything. Indeed, this is why fix gives rise to ⊥.
If you do allow fix, then writing a program to give a term of any type is trivial; the program would simply print fix id for every type.
Djinn is mostly a toy, but it can do some fun things, like deriving the correct Monad instances for Reader and Cont given the types of return and (>>=). You can try it out by installing the djinn package, or using lambdabot, which integrates it as the #djinn command.
Oleg at okmij.org has an implementation of this. There is a short introduction here but the literate Haskell source contains the details and the description of the process. (I'm not sure how this corresponds to Djinn in power, but it is another example.)
There are cases where is no unique function:
fst', snd' :: (a, a) -> a
fst' (a,_) = a
snd' (_,b) = b
Not only this; there are cases where there are an infinite number of functions:
list0, list1, list2 :: [a] -> a
list0 l = l !! 0
list1 l = l !! 1
list2 l = l !! 2
-- etc.
-- Or
mkList0, mkList1, mkList2 :: a -> [a]
mkList0 _ = []
mkList1 a = [a]
mkList2 a = [a,a]
-- etc.
(If you only want total functions, then consider [a] as restricted to infinite lists for list0, list1 etc, i.e. data List a = Cons a (List a))
In fact, if you have recursive types, any types involving these correspond to an infinite number of functions. However, at least in the case above, there is a countable number of functions, so it is possible to create an (infinite) list containing all of them. But, I think the type [a] -> [a] corresponds to an uncountably infinite number of functions (again restrict [a] to infinite lists) so you can't even enumerate them all!
(Summary: there are types that correspond to a finite, countably infinite and uncountably infinite number of functions.)
This is impossible in general (and for languages like Haskell that does not even has the strong normalization property), and only possible in some (very) special cases (and for more restricted languages), such as when a codomain type has the only one constructor (for example, a function f :: forall a. a -> () can be determined uniquely). In order to reduce a set of possible definitions for a given signature to a singleton set with just one definition need to give more restrictions (in the form of additional properties, for example, it is still difficult to imagine how this can be helpful without giving an example of use).
From the (n-)categorical point of view types corresponds to objects, terms corresponds to arrows (constructors also corresponds to arrows), and function definitions corresponds to 2-arrows. The question is analogous to the question of whether one can construct a 2-category with the required properties by specifying only a set of objects. It's impossible since you need either an explicit construction for arrows and 2-arrows (i.e., writing terms and definitions), or deductive system which allows to deduce the necessary structure using a certain set of properties (that still need to be defined explicitly).
There is also an interesting question: given an ADT (i.e., subcategory of Hask) is it possible to automatically derive instances for Typeable, Data (yes, using SYB), Traversable, Foldable, Functor, Pointed, Applicative, Monad, etc (?). In this case, we have the necessary signatures as well as additional properties (for example, the monad laws, although these properties can not be expressed in Haskell, but they can be expressed in a language with dependent types). There is some interesting constructions:
http://ulissesaraujo.wordpress.com/2007/12/19/catamorphisms-in-haskell
which shows what can be done for the list ADT.
The question is actually rather deep and I'm not sure of the answer, if you're asking about the full glory of Haskell types including type families, GADT's, etc.
What you're asking is whether a program can automatically prove that an arbitrary type is inhabited (contains a value) by exhibiting such a value. A principle called the Curry-Howard Correspondence says that types can be interpreted as mathematical propositions, and the type is inhabited if the proposition is constructively provable. So you're asking if there is a program that can prove a certain class of propositions to be theorems. In a language like Agda, the type system is powerful enough to express arbitrary mathematical propositions, and proving arbitrary ones is undecidable by Gödel's incompleteness theorem. On the other hand, if you drop down to (say) pure Hindley-Milner, you get a much weaker and (I think) decidable system. With Haskell 98, I'm not sure, because type classes are supposed to be able to be equivalent to GADT's.
With GADT's, I don't know if it's decidable or not, though maybe some more knowledgeable folks here would know right away. For example it might be possible to encode the halting problem for a given Turing machine as a GADT, so there is a value of that type iff the machine halts. In that case, inhabitability is clearly undecidable. But, maybe such an encoding isn't quite possible, even with type families. I'm not currently fluent enough in this subject for it to be obvious to me either way, though as I said, maybe someone else here knows the answer.
(Update:) Oh a much simpler interpretation of your question occurs to me: you may be asking if every Haskell type is inhabited. The answer is obviously not. Consider the polymorphic type
a -> b
There is no function with that signature (not counting something like unsafeCoerce, which makes the type system inconsistent).
In Leroy's paper on how recursive modules are typed in OCaml, it is written that modules are checked in an environment made of approximations of module types:
module rec A = ... and B = ... and C = ...
An environment {A -> approx(A); B -> approx(B); C -> approx(C) } is first built, and then used to compute the types of A, B and C.
I noticed that, in some cases, approximations are not good enough, and typechecking fails. In particular, when putting compilation units sources in a recursive module definition, typechecking can fail whereas the compiler was able to compile the compilation units separately.
Coming back to my first example, I found that a solution would be to type A in the initial approximated environment, but then to type B in that initial environment extended with the new computed type of A, and to type C in the previous env with the new computed type of B, and so on.
Before investigating more, I would like to know if there is any prior work on this subject, showing that such a compilation scheme for recursive modules is either safe or unsafe ? Is there a counter-example showing an unsafe program correctly typed with this scheme ?
First, note that Leroy (or Ocaml) does not allow module rec without explicit signature annotations. So it's rather
module rec A : SA = ... and B : SB = ... and C : SC = ...
and the approximative environment is {A : approx(SA), B : approx(SB), C : approx(SC)}.
It is not surprising that some modules type-check when defined separately, but not when defined recursively. After all, the same is already true for core-language declarations: in a 'let rec', recursive occurrences of the bound variables are monomorphic, while with separated 'let' declarations, you can use previous variables polymorphically. Intuitively, the reason is that you cannot have all the knowledge that you'd need to infer the more liberal types before you have actually checked the definitions.
Regarding your suggestion, the problem with it is that it makes the module rec construct unsymmetric, i.e. order would matter in potentially subtle ways. That violates the spirit of recursive definitions (at least in ML), which should always be indifferent to ordering.
In general, the issue with recursive typing is not so much soundness, but rather completeness. You don't want a type system that is either undecidable in general, or whose specification is dependent on algorithmic artefacts (like checking order).
On a more general note, it is well-known that Ocaml's treatment of recursive modules is rather restrictive. There has been work, e.g. by Nakata & Garrigue, that pushes its limits further. However, I am convinced that ultimately, you won't be able to get as liberal as you'd like (and that applies to other aspects of its type module system as well, e.g. functors) without abandoning Ocaml's purely syntactic approach to module typing. But then, I'm biased. :)