Why does OCaml sometimes require eta expansion? - haskell

If I have the following OCaml function:
let myFun = CCVector.map ((+) 1);;
It works fine in Utop, and Merlin doesn't mark it as a compilation error. When I try to compile it, however, I get the following error:
Error: The type of this expression,
(int, '_a) CCVector.t -> (int, '_b) CCVector.t,
contains type variables that cannot be generalized
If I eta-expand it however then it compiles fine:
let myFun foo = CCVector.map ((+) 1) foo;;
So I was wondering why it doesn't compile in eta-reduced form, and also why the eta-reduced form seems to work in the toplevel (Utop) but not when compiling?
Oh, and the documentation for CCVector is here. The '_a part can be either `RO or `RW, depending whether it is read-only or mutable.

What you got here is the value polymorphism restriction of ML language family.
The aim of the restriction is to settle down let-polymorphism and side effects together. For example, in the following definition:
let r = ref None
r cannot have a polymorphic type 'a option ref. Otherwise:
let () =
r := Some 1; (* use r as int option ref *)
match !r with
| Some s -> print_string s (* this time, use r as a different type, string option ref *)
| None -> ()
is wrongly type-checked as valid, but it crashes, since the reference cell r is used for these two incompatible types.
To fix this issue many researches were done in 80's, and the value polymoprhism is one of them. It restricts polymorphism only to let bindings whose definition form is "non-expansive". Eta expanded form is non expansive therefore your eta expanded version of myFun has a polymorphic type, but not for eta reduced one. (More precisely speaking, OCaml uses a relaxed version of this value polymorphism, but the story is basically the same.)
When the definition of let binding is expansive there is no polymorphism introduced therefore the type variables are left non-generalized. These types are printed as '_a in the toplevel, and their intuitive meaning is: they must be instantiated to some concrete type later:
# let r = ref None (* expansive *)
val r : '_a option ref = {contents = None} (* no polymorphism is allowed *)
(* type checker does not reject this,
hoping '_a is instantiated later. *)
We can fix the type '_a after the definition:
# r := Some 1;; (* fixing '_a to int *)
- : unit = ()
# r;;
- : int option ref = {contents = Some 1} (* Now '_a is unified with int *)
Once fixed, you cannot change the type, which prevents the crash above.
This typing delay is permitted until the end of the typing of the compilation unit. The toplevel is a unit which never ends and therefore you can have values with '_a type variables anywhere of the session. But in the separated compilation, '_a variables must be instantiated to some type without type variables till the end of ml file:
(* test.ml *)
let r = ref None (* r : '_a option ref *)
(* end of test.ml. Typing fails due to the non generalizable type variable remains. *)
This is what is happening with your myFun function with the compiler.
AFAIK, there is no perfect solution to the problem of polymorphism and side effects. Like other solutions, the value polymorphism restriction has its own drawback: if you want to have a polymorphic value, you must make the definition in non-expansive: you must eta-expand myFun. This is a bit lousy but is considered acceptable.
You can read some other answers:
http://caml.inria.fr/pub/old_caml_site/FAQ/FAQ_EXPERT-eng.html#variables_de_types_faibles
What is the difference between 'a and '_l?
or search by like "value restriction ml"

Related

Implicit, static type cast (coercion) in Haskell

Problem
Consider the following design problem in Haskell. I have a simple, symbolic EDSL in which I want to express variables and general expressions (multivariate polynomials) such as x^2 * y + 2*z + 1. In addition, I want to express certain symbolic equations over expressions, say x^2 + 1 = 1, as well as definitions, like x := 2*y - 2.
The goal is to:
Have a separate type for variables and general expressions - certain
functions might be applied to variables and not complex expressions.
For instance, a definition operator := might be of type
(:=) :: Variable -> Expression -> Definition and it should not
be possible to pass a complex expression as its left-hand side
parameter (though it should be possible to pass a variable as its
right-hand side parameter, without explicit casting).
Have expressions an instance of Num, so that it's possible to
promote integer literals to expressions and use a convenient
notation for common algebraic operations like addition or
multiplication without introducing some auxiliary wrapper operators.
In other words, I would like to have an implicit and static type cast (coercion) of variables to expressions. Now, I know that as such, there are no implicit type casts in Haskell. Nevertheless, certain object-oriented programming concepts (simple inheritance, in this case) are expressible in Haskell's type system, either with or without language extensions. How could I satisfy both above points while keeping a lightweight syntax? Is it even possible?
Discussion
It is clear that the main problem here is Num's type restriction, e.g.
(+) :: Num a => a -> a -> a
In principle, it's possible to write a single (generalised) algebraic data type for both variables and expressions. Then, one could write := in such a way, that the left-hand side expression is discriminated and only a variable constructor is accepted, with a run-time error otherwise. That's however not a clean, static (i.e. compile-time) solution...
Example
Ideally, I would like to achieve a lightweight syntax such as
computation = do
x <- variable
t <- variable
t |:=| x^2 - 1
solve (t |==| 0)
In particular, I want to forbid notation like
t + 1 |:=| x^2 - 1 since := should give a definition of a variable and not an entire left-hand side expression.
To leverage polymorphism rather than subtyping (because that's all you have in Haskell), don't think "a variable is an expression", but "both variables and expressions have some operations in common". Those operations can be put in a type class:
class HasVar e where fromVar :: Variable -> e
instance HasVar Variable where fromVar = id
instance HasVar Expression where ...
Then, rather than casting things, make things polymorphic. If you have v :: forall e. HasVar e => e, it can be used both as an expression and as a variable.
example :: (forall e. HasVar e => e) -> Definition
example v = (v := v) -- v can be used as both Variable and Expression
where
(:=) :: Variable -> Expression -> Definition
Skeleton to make the code below typecheck: https://gist.github.com/Lysxia/da30abac357deb7981412f1faf0d2103
computation :: Solver ()
computation = do
V x <- variable
V t <- variable
t |:=| x^2 - 1
solve (t |==| 0)

How does Haskell type-check infinite recursive values?

Define this data type:
data NaturalNumber = Zero | S NaturalNumber
deriving (Show)
In Haskell (compiled using GHC), this code will run without warning or error:
infinity = S infinity
inf1 = S inf2
inf2 = S inf1
So both recursive and mutually recursive infinitely deep values pass the type check.
However, the following code gives an error:
j = S 'h'
The error states Couldn't match expected type ‘NaturalNumber’ with actual type 'Char'. The (same) error persists even if I set
j = S (S (S (S ... (S 'h')...)))
with a hundred or so nested S's.
How can Haskell tell that infinity is a valid member of NaturalNumber but j is not?
Interestingly, it also allows:
bottom = bottom
k = S bottom
Does Haskell merely try to prove incorrectness of a program, and if it fails to do so then allows it? Or is Haskell's type system not Turing complete, so if it allows the program then the program is provably (at the type level) correct?
(If the type system (in the formal semantics of Haskell, instead of only the type checker) is Turing complete, then it will either fail to realize some correctly typed programs are correct or some incorrectly typed programs are incorrect, due to the undecidability of the halting problem.)
Well
S :: NaturalNumber -> NaturalNumber
In
infinity = S infinity
We start by assuming nothing: we assign infinity some unsolved type _a and try to figure out what it is. We know that we have applied S to infinity, so _a must be whatever is on the left side of the arrow in the constructor’s type, which is NaturalNumber. We know that infinity is the result of an application of S, so infinity :: NaturalNumber, again (if we got two conflicting types for this definition, we’d have to emit a type error).
Similar reasoning holds for the mutually recursive definitions. inf1 must be a NaturalNumber because it appears as an argument to S in inf2; inf2 must be a NaturalNumber because it is the result of S; etc.
The general algorithm is to assign definitions unknown types (notable exceptions are literals and constructors), and then to create constraints on those types by seeing how every definition is used. E.g. this must be some form of list because it’s being reversed, and this must be an Int because it’s being used to look up a value from an IntMap, etc.
In the case of
oops = S 'a'
'a' :: Char as it’s a literal, but, also, we must have 'a' :: NaturalNumber because it’s used as an argument to S. We get the obviously bogus constraint that the type of the literal must both be Char and NaturalNumber, which causes a type error.
And in
bottom = bottom
We start with bottom :: _a. The only constraint is _a ~ _a, because a value of type _a (bottom) is being used where a value of type _a is expected (on the RHS of the definition of bottom). Since there is nothing to further constrain the type, the unsolved type variable is generalized: it gets bound by a universal quantifier to produce bottom :: forall a. a.
Note how both uses of bottom above have the same type (_a) while inferring the type of bottom. This breaks polymorphic recursion: every occurrence of a value within its definition is taken to be of the same type as the definition itself. E.g.
-- perfectly balanced binary trees
data Binary a = Leaf a | Branch (Binary (a, a))
-- headB :: _a -> _r
headB (Leaf x) = x -- _a ~ Binary _r; headB :: Binary _r -> _r
headB (Branch bin) = fst (headB bin)
-- recursive call has type headB :: Binary _r -> _r
-- but bin :: Binary (_r, _r); mismatch
So you need a type signature:
headB :: {-forall a.-} Binary a -> a
headB (Leaf x) = x
headB (Branch bin) = fst (headB {-#(a, a)-} bin)
-- knowing exactly what headB's signature is allows for polymorphic recursion
So: when something doesn't have a type signature, the type checker tries to assign it a type, and if it comes across a bogus constraint on its way, it rejects the program. When something has a type signature, the type checker descends into it to make sure it's correct (tries to prove it wrong, if you prefer to think of it that way).
Haskell’s type system is not Turing complete because there are heavy syntactic restrictions to prevent e.g. type lambdas (without language extensions), but it doesn’t suffice to make sure all programs run to completion without error because it still allows bottoms (not to mention all the unsafe functions). It provides the weaker guarantee that, if a program runs to completion without using an unsafe function, it will remain type-correct. Under GHC, with sufficient language extensions, the type system does become Turing complete. I don't think it allows ill-typed programs through; I think the most you can do is throw the compiler into an infinite loop.

Confused about Haskell polymorphic types

I have defined a function :
gen :: a -> b
So just trying to provide a simple implementation :
gen 2 = "test"
But throws error :
gen.hs:51:9:
Couldn't match expected type ‘b’ with actual type ‘[Char]’
‘b’ is a rigid type variable bound by
the type signature for gen :: a -> b at gen.hs:50:8
Relevant bindings include gen :: a -> b (bound at gen.hs:51:1)
In the expression: "test"
In an equation for ‘gen’: gen 2 = "test"
Failed, modules loaded: none.
So my function is not correct. Why is a not typed as Int and b not typed as String ?
This is a very common misunderstanding.
The key thing to understand is that if you have a variable in your type signature, then the caller gets to decide what type that is, not you!
So you cannot say "this function returns type x" and then just return a String; your function actually has to be able to return any possible type that the caller may ask for. If I ask your function to return an Int, it has to return an Int. If I ask it to return a Bool, it has to return a Bool.
Your function claims to be able to return any possible type, but actually it only ever returns String. So it doesn't do what the type signature claims it does. Hence, a compile-time error.
A lot of people apparently misunderstand this. In (say) Java, you can say "this function returns Object", and then your function can return anything it wants. So the function decides what type it returns. In Haskell, the caller gets to decide what type is returned, not the function.
Edit: Note that the type you're written, a -> b, is impossible. No function can ever have this type. There's no way a function can construct a value of type b out of thin air. The only way this can work is if some of the inputs also involve type b, or if b belongs to some kind of typeclass which allows value construction.
For example:
head :: [x] -> x
The return type here is x ("any possible type"), but the input type also mentions x, so this function is possible; you just have to return one of the values that was in the original list.
Similarly, gen :: a -> a is a perfectly valid function. But the only thing it can do is return it's input unchanged (i.e., what the id function does).
This property of type signatures telling you what a function does is a very useful and powerful property of Haskell.
gen :: a -> b does not mean "for some type a and some type b, foo must be of type a -> b", it means "for any type a and any type b, foo must be of type a -> b".
to motivate this: If the type checker sees something like let x :: Int = gen "hello", it sees that gen is used as String -> Int here and then looks at gen's type to see whether it can be used that way. The type is a -> b, which can be specialized to String -> Int, so the type checker decides that this is fine and allows this call. That is since the function is declared to have type a -> b, the type checker allows you to call the function with any type you want and allows you to use the result as any type you want.
However that clearly does not match the definition you gave the function. The function knows how to handle numbers as arguments - nothing else. And likewise it knows how to produce strings as its result - nothing else. So clearly it should not be possible to call the function with a string as its argument or to use the function's result as an Int. So since the type a -> b would allow that, it's clearly the wrong type for that function.
Your type signature gen :: a -> b is stating, that your function can work for any type a (and provide any type b the caller of the function demands).
Besides the fact that such a function is hard to come by, the line gen 2 = "test" tries to return a String which very well may not be what the caller demands.
Excellent answers. Given your profile, however, you seem to know Java, so I think it's valuable to connect this to Java as well.
Java offers two kinds of polymorphism:
Subtype polymorphism: e.g., every type is a subtype of java.lang.Object
Generic polymorphism: e.g., in the List<T> interface.
Haskell's type variables are a version of (2). Haskell doesn't really have a version of (1).
One way to think of generic polymorphism is in terms of templates (which is what C++ people call them): a type that has a type variable parameter is a template that can be specialized into a variety of monomorphic types. So for example, the interface List<T> is a template for constructing monomorphic interfaces like List<String>, List<List<String>> and so on, all of which have the same structure but differ only because the type variable T gets replaced uniformly throughout the signatures with the instantiation type.
The concept that "the caller chooses" that several responders have mentioned here is basically a friendly way of referring to instantiation. In Java, for example, the most common point where the type variable gets "chosen" is when an object is instantiated:
List<String> myList = new ArrayList<String>();
Second common point is that a subtype of a generic type may instantiate the supertype's variables:
class MyFunction implements Function<Integer, String> {
public String apply(Integer i) { ... }
}
Third one is methods that allow the caller to instantiate a variable that's not a parameter of its enclosing type:
/**
* Visitor-pattern style interface for a simple arithmetical language
* abstract syntax tree.
*/
interface Expression {
// The caller of `accept` implicitly chooses which type `R` is,
// by supplying a `Visitor<R>` with `R` instantiated to something
// of its choice.
<R> accept(Expression.Visitor<R> visitor);
static interface Visitor<R> {
R constant(int i);
R add(Expression a, Expression b);
R multiply(Expression a, Expression b);
}
}
In Haskell, instantiation is carried out implicitly by the type inference algorithm. In any expression where you use gen :: a -> b, type inference will infer what types need to be instantiated for a and b, given the context in which gen is used. So basically, "caller chooses" means that any code that uses gen controls the types to which a and b will be instantiated; if I write gen [()], then I'm implicitly instantiating a to [()]. The error here means that your type declaration says that gen [()] is allowed, but your equation gen 2 = "test" implies that it's not.
In Haskell, type variables are implicitly quantified, but we can make this explicit:
{-# LANGUAGE ScopedTypeVariables #-}
gen :: forall a b . a -> b
gen x = ????
The "forall" is really just a type level version of a lambda, often written Λ. So gen is a function taking three arguments: a type, bound to the name a, another type, bound to the name b, and a value of type a, bound to the name x. When your function is called, it is called with those three arguments. Consider a saner case:
fst :: (a,b) -> a
fst (x1,x2) = x1
This gets translated to
fst :: forall (a::*) (b::*) . (a,b) -> a
fst = /\ (a::*) -> /\ (b::*) -> \ (x::(a,b)) ->
case x of
(x1, x2) -> x1
where * is the type (often called a kind) of normal concrete types. If I call fst (3::Int, 'x'), that gets translated into
fst Int Char (3Int, 'x')
where I use 3Int to represent specifically the Int version of 3. We could then calculate it as follows:
fst Int Char (3Int, 'x')
=
(/\ (a::*) -> /\ (b::*) -> \(x::(a,b)) -> case x of (x1,x2) -> x1) Int Char (3Int, 'x')
=
(/\ (b::*) -> \(x::(Int,b)) -> case x of (x1,x2) -> x1) Char (3Int, 'x')
=
(\(x::(Int,Char)) -> case x of (x1,x2) -> x1) (3Int, x)
=
case (3Int,x) of (x1,x2) -> x1
=
3Int
Whatever types I pass in, as long as the value I pass in matches, the fst function will be able to produce something of the required type. If you try to do this for a->b, you will get stuck.

SML conversions to Haskell

A few basic questions, for converting SML code to Haskell.
1) I am used to having local embedded expressions in SML code, for example test expressions, prints, etc. which functions local tests and output when the code is loaded (evaluated).
In Haskell it seems that the only way to get results (evaluation) is to add code in a module, and then go to main in another module and add something to invoke and print results.
Is this right? in GHCi I can type expressions and see the results, but can this be automated?
Having to go to the top level main for each test evaluation seems inconvenient to me - maybe just need to shift my paradigm for laziness.
2) in SML I can do pattern matching and unification on a returned result, e.g.
val myTag(x) = somefunct(a,b,c);
and get the value of x after a match.
Can I do something similar in Haskell easily, without writing separate extraction functions?
3) How do I do a constructor with a tuple argument, i.e. uncurried.
in SML:
datatype Thing = Info of Int * Int;
but in Haskell, I tried;
data Thing = Info ( Int Int)
which fails. ("Int is applied to too many arguments in the type:A few Int Int")
The curried version works fine,
data Thing = Info Int Int
but I wanted un-curried.
Thanks.
This question is a bit unclear -- you're asking how to evaluate functions in Haskell?
If it is about inserting debug and tracing into pure code, this is typically only needed for debugging. To do this in Haskell, you can use Debug.Trace.trace, in the base package.
If you're concerned about calling functions, Haskell programs evaluate from main downwards, in dependency order. In GHCi you can, however, import modules and call any top-level function you wish.
You can return the original argument to a function, if you wish, by making it part of the function's result, e.g. with a tuple:
f x = (x, y)
where y = g a b c
Or do you mean to return either one value or another? Then using a tagged union (sum-type), such as Either:
f x = if x > 0 then Left x
else Right (g a b c)
How do I do a constructor with a tuple argument, i.e. uncurried in SML
Using the (,) constructor. E.g.
data T = T (Int, Int)
though more Haskell-like would be:
data T = T Int Bool
and those should probably be strict fields in practice:
data T = T !Int !Bool
Debug.Trace allows you to print debug messages inline. However, since these functions use unsafePerformIO, they might behave in unexpected ways compared to a call-by-value language like SML.
I think the # syntax is what you're looking for here:
data MyTag = MyTag Int Bool String
someFunct :: MyTag -> (MyTag, Int, Bool, String)
someFunct x#(MyTag a b c) = (x, a, b, c) -- x is bound to the entire argument
In Haskell, tuple types are separated by commas, e.g., (t1, t2), so what you want is:
data Thing = Info (Int, Int)
Reading the other answers, I think I can provide a few more example and one recommendation.
data ThreeConstructors = MyTag Int | YourTag (String,Double) | HerTag [Bool]
someFunct :: Char -> Char -> Char -> ThreeConstructors
MyTag x = someFunct 'a' 'b' 'c'
This is like the "let MyTag x = someFunct a b c" examples, but it is a the top level of the module.
As you have noticed, Haskell's top level can defined commands but there is no way to automatically run any code merely because your module has been imported by another module. This is entirely different from Scheme or SML. In Scheme the file is interpreted as being executed form-by-form, but Haskell's top level is only declarations. Thus Libraries cannot do normal things like run initialization code when loaded, they have to provide a "pleaseRunMe :: IO ()" kind of command to do any initialization.
As you point out this means running all the tests requires some boilerplate code to list them all. You can look under hackage's Testing group for libraries to help, such as test-framework-th.
For #2, yes, Haskell's pattern matching does the same thing. Both let and where do pattern matching. You can do
let MyTag x = someFunct a b c
in ...
or
...
where MyTag x = someFunct a b c

in haskell, why do I need to specify type constraints, why can't the compiler figure them out?

Consider the function,
add a b = a + b
This works:
*Main> add 1 2
3
However, if I add a type signature specifying that I want to add things of the same type:
add :: a -> a -> a
add a b = a + b
I get an error:
test.hs:3:10:
Could not deduce (Num a) from the context ()
arising from a use of `+' at test.hs:3:10-14
Possible fix:
add (Num a) to the context of the type signature for `add'
In the expression: a + b
In the definition of `add': add a b = a + b
So GHC clearly can deduce that I need the Num type constraint, since it just told me:
add :: Num a => a -> a -> a
add a b = a + b
Works.
Why does GHC require me to add the type constraint? If I'm doing generic programming, why can't it just work for anything that knows how to use the + operator?
In C++ template programming, you can do this easily:
#include <string>
#include <cstdio>
using namespace std;
template<typename T>
T add(T a, T b) { return a + b; }
int main()
{
printf("%d, %f, %s\n",
add(1, 2),
add(1.0, 3.4),
add(string("foo"), string("bar")).c_str());
return 0;
}
The compiler figures out the types of the arguments to add and generates a version of the function for that type. There seems to be a fundamental difference in Haskell's approach, can you describe it, and discuss the trade-offs? It seems to me like it would be resolved if GHC simply filled in the type constraint for me, since it obviously decided it was needed. Still, why the type constraint at all? Why not just compile successfully as long as the function is only used in a valid context where the arguments are in Num?
If you don't want to specify the function's type, just leave it out and the compiler will infer the types automatically. But if you choose to specify the types, they have to be correct and accurate.
The entire point of types is to have a formal way to declare the right and wrong way to use a function. A type of (Num a) => a -> a -> a describes exactly what is required of the arguments. If you omitted the class constraint, you would have a more general function that could be used (erroneously) in more places.
And it’s not just preventing you from passing non-Num values to add. Everywhere the function goes, the type is sure to go. Consider this:
add :: a -> a -> a
add a b = a + b
foo :: [a -> a -> a]
foo = [add]
value :: [String]
value = [f "hello" "world" | f <- foo]
You want the compiler to reject this, right? How does it do that? By adding class constraints, and checking that they are not removed, even if you don’t directly name the function.
What’s different in the C++ version? There are no class constraints. The compiler substitutes int or std::string for T, then tries to compile the resulting code and looks for a matching + operator that it can use. The template system is “looser”, since it accepts more invalid programs, and this is a symptom of it being a separate stage before compilation. I would love to modify C++ to add the <? extends T> semantics from Java’s generics. Just learn the type system and recognize that parametric polymorphism is “stronger” than C++ templates, namely it will reject more invalid programs.
I think you might be being tripped up by the "crazy moon poetry" of GHC's error messages. It's not saying that it (being GHC) couldn't deduce the (Num a) constraint. It is saying that the (Num a) constraint can't be deduced from your type signature, which it knows must be there from the use of +. Hence, you're are stating that this function has a type more general than the compiler knows it can have. The compiler doesn't want you lying about your functions to the world!
In the first example you gave, without the type signature, if you run :t add in ghci, you'll see that the compiler knows full well that the (Num a) constraint is there.
As for C++'s templates, remember that they are syntactic templates and are only fully type checked in each instance as they are used. Your add template will work with any types so long as, at each place it is used, there is a suitable + operator and perhaps conversions, to make an instance of the template viable. No guarantees can be made about the template until then... which is why the body of the template must be "visible" to each module that uses it.
Basically, all C++ can do is validate the syntax of the template, and then keep it around as a kind of very hygienic macro. Whereas Haskell generates a real function for add (leaving aside that it may choose to also generate type specific specializations for optimization).
There are cases where the compiler can't figure out the right type for you and where it needs your help. Consider
f s = show $ read s
The compiler says:
Ambiguous type variable `a' in the constraints:
Read a' arising from a use of `read' at src\Main.hs:20:13-18
`Show a' arising from a use of `show' at src\Main.hs:20:6-9
Probable fix: add a type signature that fixes these type variable(s)
(Strange enough, it seems you can define this function in ghci, but it seems there is no way to actually use it)
If you want that something like f "1" works, you need to specify the type:
f s = show $ (read s :: Int)

Resources