How does this constitute a Haskell rigid type error? - haskell

This works:
data Wrapped a = Wrapped a
alpha :: IO s -> IO ()
alpha x = do
rv <- wrapit x
return ()
where
wrapit :: IO s -> IO (Wrapped s)
wrapit x' = do
a <- x'
return (Wrapped a)
This doesn't:
data Wrapped a = Wrapped a
alpha :: IO s -> IO ()
alpha x = do
rv <- wrapit
return ()
where
wrapit :: IO (Wrapped s)
wrapit = do
a <- x
return (Wrapped a)
Why?

This is due to the way in which type variables are scoped and quantified in standard Haskell. You can make the second version work like so:
{-# LANGUAGE RankNTypes, ScopedTypeVariables #-}
module RigidProblem where
data Wrapped a = Wrapped a
alpha :: forall s. IO s -> IO ()
alpha x = do
rv <- wrapit
return ()
where
wrapit :: IO (Wrapped s)
wrapit = do
a <- x
return (Wrapped a)
There are two changes: the RankNTypes and ScopedTypeVariables language extensions are enabled and the explicit forall s is added in the type signature of alpha. The first of the two extensions is what allows us to introduce the explicit forall s thus bringing the s into scope inside the body of alpha, whereas the second makes it so that the signature on wrapit is not taken by the type inference engine to contain an implicit forall s -- instead, the s in that signature is taken to name a type variable which should be in scope and is identified with it.
The current default situation in Haskell, if I understand correctly, is that all rigid type variables (meaning type variables occurring in type signatures explicitly provided by the programmer) are implicitly quantified and not lexically scoped, so that there is no way to refer to a rigid type variable from an outer scope in an explicit signature provided in an inner scope... (Oh bother, I'm sure someone could phrase it better than this.) Anyway, from the type checker's point of view, the s in alpha's signature and the one in wrapit's signature are totally unrelated and cannot be unified -- thus the error.
See this page from the GHC docs and this page from Haskell Prime wiki for more information.
Update: I just realised that I never explained why the first version works. For the sake of completeness: note that with the first version, you could use t in place of s in wrapit's signature and nothing would change. You could even take wrapit out of the where block and make it a separate top level function. The key point is that it is a polymorphic function, so that the type of wrapit x is determined by the type of x. No relation of the type variable used in the first version wrapit's signature to that used in alpha's signature would be of any use here. With the second version this is of course different and you have to resort to the above mentioned trickery to make wrapit's s be the same thing as alpha's s.

Michał Marczyk's answer above is correct, but it is worth noting that the second version does work if you remove the type signature of the wrapit function:
data Wrapped a = Wrapped a
alpha :: IO s -> IO ()
alpha x = do
rv <- wrapit
return ()
where
-- No type signature here!
wrapit = do
a <- x
return (Wrapped a)
That is, the problem isn't with the code itself; it's that Haskell 98 doesn't let you write a type signature for the wrapit function, because it includes a type variable bound by its context (the outer alpha function), and H98 doesn't have a way to express that. As Michał said, enabling ScopedTypeVariables allows you to do so.

Related

Heterogeneous Data.Map in Haskell

Is it possible to do heterogeneous Data.Map in Haskell with GADT instead of Dynamic? I tried to model heterogeneous collection as laid out in this answer:
{-# LANGUAGE GADTs #-}
class Contract a where
toString :: a -> String
data Encapsulated where
Encapsulate :: Contract a => a -> Encapsulated
getTypedObject :: Encapsulated -> a
getTypedObject (Encapsulate x) = x
The idea being that Encapsulated could be used to store different objects of TypeClass a, and then extracted at run-time for specific type.
I get error about type of x not matching Contract a. Perhaps I need to specify some kind of class constraints to tell GHC that type of x in Encapsulate x is same as a in Contract a?
T.hs:10:34:
Couldn't match expected type ‘a’ with actual type ‘a1’
‘a1’ is a rigid type variable bound by
a pattern with constructor
Encapsulate :: forall a. Contract a => a -> Encapsulated,
in an equation for ‘getTypedObject’
at T.hs:10:17
‘a’ is a rigid type variable bound by
the type signature for getTypedObject :: Encapsulated -> a
at T.hs:9:19
Relevant bindings include
x :: a1 (bound at T.hs:10:29)
getTypedObject :: Encapsulated -> a (bound at T.hs:10:1)
In the expression: x
In an equation for ‘getTypedObject’:
getTypedObject (Encapsulate x) = x
I am trying this approach because I have JSON objects of different types, and depending on the type that is decoded at run-time over the wire, we want to retrieve appropriate type-specific builder from Map (loaded at run-time IO in main from configuration files, and passed to the function) and pass it decoded JSON data of the same type.
Dynamic library would work here. However, I am interested in finding out if there are other possible approaches such as GADTs or datafamilies.
your problem is that you push the a out again (which will not work) - what you can do is using the contract internally like this:
useEncapsulateContract :: Encapsulated -> String
useEncapsulateContract (Encapsulate x) = toString x
basically the compiler is telling you everything you need to know: inside you have a forall a. Contract a (so basically a constraint a to be a Contract)
On getTypedObject :: Encapsulated -> a you don't have this constraint - you are telling the compiler: "look this works for every a I'll demand"
To get it there you would have to parametrize Encapsulated to Encapsulated a which you obviously don't want.
The second version (the internal I gave) works because you have the constraint on the data-constructor and so you can use it there
to extent this a little:
this
getTypedObject :: Contract a => Encapsulated -> a
getTypedObject (Encapsulate x) = x
wouldn't work either as now the you would have Contract a but still it could be two different types which just share this class.
And to give hints to the compiler that both should be the same you would have to parametrize Encapsulate again ....
right now by doing this:
Encapsulate :: Contract a => a -> Encapsulated
you erase that information
#Carsten answer is obviously right, but my two cents that helped me understand that before.
When you write:
getTypedObject :: Encapsulated -> a
What you are "saying" is:
getTypedObject is a function that can take a value of the Encapsulated type and its result can be used whenever any type whatsoever is needed.
You can't obviously satisfy that, and the compiler won't allow you to try. You can only use the knowledge about the value inside of Encapsulated to bring out something meaningful based on the Contract. In other words, if the Contract wasn't there, there'd be no way for you to do anything meaningful with that value.
The concept here can be succintly described as type erasure and is also present in other languages, C++ being one I know of. Hence the value is in erasing every information about the type except the things you want to preserve via the contract they satisfy. The downside is that getting the original types back requires runtime inspection.
As a bonus, here's how a dynamic approach might work:
{-# LANGUAGE GADTs #-}
import Unsafe.Coerce
data Encapsulated where
Encapsulate :: Show a => a -> Encapsulated
getTypedObject :: Encapsulated -> a
getTypedObject (Encapsulate x) = unsafeCoerce x
printString :: String -> IO ()
printString = print
x = Encapsulate "xyz"
y = getTypedObject x
main = printString y
But it's very easy to see how that could break, right? :)

Type inference with GADTs - a0 is untouchable

Lets say I have this program
{-# LANGUAGE GADTs #-}
data My a where
A :: Int -> My Int
B :: Char -> My Char
main :: IO ()
main = do
let x = undefined :: My a
case x of
A v -> print v
-- print x
compiles fine.
But when I comment in the print x, I get:
gadt.hs: line 13, column 12:
Couldn't match type ‘a0’ with ‘()’
‘a0’ is untouchable
inside the constraints (a1 ~ GHC.Types.Int)
bound by a pattern with constructor
Main.A :: GHC.Types.Int -> Main.My GHC.Types.Int,
in a case alternative
at /home/niklas/src/hs/gadt-binary.hs:13:5-7
Expected type: GHC.Types.IO a0
Actual type: GHC.Types.IO ()
In the expression: System.IO.print v
In a case alternative: Main.A v -> System.IO.print v
Why do I get this error in line 13 (A v -> print v) instead of only in the print x line?
I thought the first occurrence should determine the type.
Please enlighten me :)
Well, first note that this has nothing to do with the particular print x: you get the same error when ending main with e.g. putStrLn "done".
So the problem is indeed in the case block, namely in that only the last statement of a do is required to have the type of the do block's signature. The other statements merely have to be in the same monad, i.e. IO a0 rather than IO ().
Now, usually this a0 is inferred from the statement itself, so for instance you can write
do getLine
putStrLn "discarded input"
though getLine :: IO String rather than IO (). However, in your example the information print :: ... -> IO () comes from inside the case block, from a GADT match. And such GADT matches behave differently from other Haskell statements: basically, they don't let any type information escape its scope, because if the information came from the GADT constructor then it's not correct outside of the case.
In that particular example, it seems obvious enough that a0 ~ () has nothing at all to do with the a1 ~ Int from the GADT match, but in general, such a fact could only be proven if GHC traced for all type information where it came from. I don't know if that's even possible, it would certainly be more complicated than Haskell's Hindley-Milner system, which heavily relies on unifying type information, which essentially assumes that it doesn't matter where the information came from.
Therefore, GADT matches simply act as a rigid “type information diode”: the stuff inside can never be used to determine types on the outside, like that the case block as a whole should be IO ().
However, you can manually assert that, with the rather ugly
(case x of
A v -> print v
) :: IO ()
or by writing
() <- case x of
A v -> print v

When are type signatures necessary in Haskell?

Many introductory texts will tell you that in Haskell type signatures are "almost always" optional. Can anybody quantify the "almost" part?
As far as I can tell, the only time you need an explicit signature is to disambiguate type classes. (The canonical example being read . show.) Are there other cases I haven't thought of, or is this it?
(I'm aware that if you go beyond Haskell 2010 there are plenty for exceptions. For example, GHC will never infer rank-N types. But rank-N types are a language extension, not part of the official standard [yet].)
Polymorphic recursion needs type annotations, in general.
f :: (a -> a) -> (a -> b) -> Int -> a -> b
f f1 g n x =
if n == (0 :: Int)
then g x
else f f1 (\z h -> g (h z)) (n-1) x f1
(Credit: Patrick Cousot)
Note how the recursive call looks badly typed (!): it calls itself with five arguments, despite f having only four! Then remember that b can be instantiated with c -> d, which causes an extra argument to appear.
The above contrived example computes
f f1 g n x = g (f1 (f1 (f1 ... (f1 x))))
where f1 is applied n times. Of course, there is a much simpler way to write an equivalent program.
Monomorphism restriction
If you have MonomorphismRestriction enabled, then sometimes you will need to add a type signature to get the most general type:
{-# LANGUAGE MonomorphismRestriction #-}
-- myPrint :: Show a => a -> IO ()
myPrint = print
main = do
myPrint ()
myPrint "hello"
This will fail because myPrint is monomorphic. You would need to uncomment the type signature to make it work, or disable MonomorphismRestriction.
Phantom constraints
When you put a polymorphic value with a constraint into a tuple, the tuple itself becomes polymorphic and has the same constraint:
myValue :: Read a => a
myValue = read "0"
myTuple :: Read a => (a, String)
myTuple = (myValue, "hello")
We know that the constraint affects the first part of the tuple but does not affect the second part. The type system doesn't know that, unfortunately, and will complain if you try to do this:
myString = snd myTuple
Even though intuitively one would expect myString to be just a String, the type checker needs to specialize the type variable a and figure out whether the constraint is actually satisfied. In order to make this expression work, one would need to annotate the type of either snd or myTuple:
myString = snd (myTuple :: ((), String))
In Haskell, as I'm sure you know, types are inferred. In other words, the compiler works out what type you want.
However, in Haskell, there are also polymorphic typeclasses, with functions that act in different ways depending on the return type. Here's an example of the Monad class, though I haven't defined everything:
class Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
fail :: String -> m a
We're given a lot of functions with just type signatures. Our job is to make instance declarations for different types that can be treated as Monads, like Maybe t or [t].
Have a look at this code - it won't work in the way we might expect:
return 7
That's a function from the Monad class, but because there's more than one Monad, we have to specify what return value/type we want, or it automatically becomes an IO Monad. So:
return 7 :: Maybe Int
-- Will return...
Just 7
return 6 :: [Int]
-- Will return...
[6]
This is because [t] and Maybe have both been defined in the Monad type class.
Here's another example, this time from the random typeclass. This code throws an error:
random (mkStdGen 100)
Because random returns something in the Random class, we'll have to define what type we want to return, with a StdGen object tupelo with whatever value we want:
random (mkStdGen 100) :: (Int, StdGen)
-- Returns...
(-3650871090684229393,693699796 2103410263)
random (mkStdGen 100) :: (Bool, StdGen)
-- Returns...
(True,4041414 40692)
This can all be found at learn you a Haskell online, though you'll have to do some long reading. This, I'm pretty much 100% certain, it the only time when types are necessary.

STRef and phantom types

Does s in STRef s a get instantiated with a concrete type? One could easily imagine some code where STRef is used in a context where the a takes on Int. But there doesn't seem to be anything for the type inference to give s a concrete type.
Imagine something in pseudo Java like MyList<S, A>. Even if S never appeared in the implementation of MyList instantiating a concrete type like MyList<S, Integer> where a concrete type is not used in place of S would not make sense. So how can STRef s a work?
tl;dr - in practice it seems it always gets initialised to RealWorld in the end
The source notes that s can be instantiated to RealWorld inside invocations of stToIO, but is otherwise uninstantiated:
-- The s parameter is either
-- an uninstantiated type variable (inside invocations of 'runST'), or
-- 'RealWorld' (inside invocations of 'Control.Monad.ST.stToIO').
Looking at the actual code for ST however it seems runST uses a specific value realWorld#:
newtype ST s a = ST (STRep s a)
type STRep s a = State# s -> (# State# s, a #)
runST :: (forall s. ST s a) -> a
runST st = runSTRep (case st of { ST st_rep -> st_rep })
runSTRep :: (forall s. STRep s a) -> a
runSTRep st_rep = case st_rep realWorld# of
(# _, r #) -> r
realWorld# is defined as a magic primitive inside the GHC source code:
realWorldName = mkWiredInIdName gHC_PRIM (fsLit "realWorld#")
realWorldPrimIdKey realWorldPrimId
realWorldPrimId :: Id -- :: State# RealWorld
realWorldPrimId = pcMiscPrelId realWorldName realWorldStatePrimTy
(noCafIdInfo `setUnfoldingInfo` evaldUnfolding
`setOneShotInfo` stateHackOneShot)
You can also confirm this in ghci:
Prelude> :set -XMagicHash
Prelude> :m +GHC.Prim
Prelude GHC.Prim> :t realWorld#
realWorld# :: State# RealWorld
From your question I can not see if you understand why the phantom s type is there at all. Even if you did not ask for this explicitly, let me elaborate on that.
The role of the phantom type
The main use of the phantom type is to constrain references (aka pointers) to stay "inside" the ST monad. Roughly, the dynamically allocated data must end its life when runST returns.
To see the issue, let's pretend that the type of runST were
runST :: ST s a -> a
Then, consider this:
data Dummy
let var :: STRef Dummy Int
var = runST (newSTRef 0)
change :: () -> ()
change = runST (modifySTRef var succ)
access :: () -> Int
result :: (Int, ())
result = (access() , change())
in result
(Above I added a few useless () arguments to make it similar to imperative code)
Now what should be the result of the code above? It could be either (0,()) or (1,()) depending on the evaluation order. This is a big no-no in the pure Haskell world.
The issue here is that var is a reference which "escaped" from its runST. When you escape the ST monad, you are no longer forced to use the monad operator >>= (or equivalently, the do notation to sequentialize the order of side effects. If references are still around, then we can still have side effects around when there should be none.
To avoid the issue, we restrict runST to work on ST s a where a does not depend on s. Why this? Because newSTRef returns a STRef s a, a reference to a tagged with the phantom type s, hence the return type depends on s and can not be extracted from the ST monad through runST.
Technically, this restriction is done by using a rank-2 type:
runST :: (forall s. ST s a) -> a
the "forall" here is used to implement the restriction. The type is saying: choose any a you wish, then provide a value of type ST s a for any s I wish, then I will return an a. Mind that s is chosen by runST, not by the caller, so it could be absolutely anything. So, the type system will accept an application runST action only if action :: forall s. ST s a where s is unconstrained, and a does not involve s (recall that the caller has to choose a before runST chooses s).
It is indeed a slightly hackish trick to implement the independence constraint, but it does work.
On the actual question
To connect this to your actual question: in the implementation of runST, s will be chosen to be any concrete type. Note that, even if s were simply chosen to be Int inside runST it would not matter much, because the type system has already constrained a to be independent from s, hence to be reference-free. As #Ganesh pointed out, RealWorld is the type used by GHC.
You also mentioned Java. One could attempt to play a similar trick in Java as follows: (warning, overly simplified code follows)
interface ST<S,A> { A call(); }
interface STAction<A> { <S> ST<S,A> call(S dummy); }
...
<A> A runST(STAction<A> action} {
RealWorld dummy = new RealWorld();
return action.call(dummy).call();
}
Above in STAction parameter A can not depend on S.

Signature of IO in Haskell (is this class or data?)

The question is not what IO does, but how is it defined, its signature. Specifically, is this data or class, is "a" its type parameter then? I didn't find it anywhere. Also, I don't understand the syntactic meaning of this:
f :: IO a
You asked whether IO a is a data type: it is. And you asked whether the a is its type parameter: it is. You said you couldn't find its definition. Let me show you how to find it:
localhost:~ gareth.rowlands$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Prelude> :i IO
newtype IO a
= GHC.Types.IO (GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))
-- Defined in `GHC.Types'
instance Monad IO -- Defined in `GHC.Base'
instance Functor IO -- Defined in `GHC.Base'
Prelude>
In ghci, :i or :info tells you about a type. It shows the type declaration and where it's defined. You can see that IO is a Monad and a Functor too.
This technique is more useful on normal Haskell types - as others have noted, IO is magic in Haskell. In a typical Haskell type, the type signature is very revealing but the important thing to know about IO is not its type declaration, rather that IO actions actually perform IO. They do this in a pretty conventional way, typically by calling the underlying C or OS routine. For example, Haskell's putChar action might call C's putchar function.
IO is a polymorphic type (which happens to be an instance of Monad, irrelevant here).
Consider the humble list. If we were to write our own list of Ints, we might do this:
data IntList = Nil | Cons { listHead :: Int, listRest :: IntList }
If you then abstract over what element type it is, you get this:
data List a = Nil | Cons { listHead :: a, listRest :: List a }
As you can see, the return value of listRest is List a. List is a polymorphic type of kind * -> *, which is to say that it takes one type argument to create a concrete type.
In a similar way, IO is a polymorphic type with kind * -> *, which again means it takes one type argument. If you were to define it yourself, it might look like this:
data IO a = IO (RealWorld -> (a, RealWorld))
(definition courtesy of this answer)
The amount of magic in IO is grossly overestimated: it has some support from compiler and runtime system, but much less than newbies usually expect.
Here is the source file where it is defined:
http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-prim-0.3.0.0/src/GHC-Types.html
newtype IO a
= IO (State# RealWorld -> (# State# RealWorld, a #))
It is just an optimized version of state monad. If we remove optimization annotations we will see:
data IO a = IO (Realworld -> (Realworld, a))
So basically IO a is a data structure storing a function that takes old real world and returns new real world with io operation performed and a.
Some compiler tricks are necessary mostly to remove Realworld dummy value efficiently.
IO type is an abstract newtype - constructors are not exported, so you cannot bypass library functions, work with it directly and perform nasty things: duplicate RealWorld, create RealWorld out of nothing or escape the monad (write a function of IO a -> a type).
Since IO can be applied to objects of any type a, as it is a polymorphic monad, a is not specified.
If you have some object with type a, then it can be 'wrappered' as an object of type IO a, which you can think of as being an action that gives an object of type a. For example, getChar is of type IO Char, and so when it is called, it has the side effect of (From the program's perspective) generating a character, which comes from stdin.
As another example, putChar has type Char -> IO (), meaning that it takes a char, and then performs some action that gives no output (in the context of the program, though it will print the char given to stdout).
Edit: More explanation of monads:
A monad can be thought of as a 'wrapper type' M, and has two associated functions:
return and >>=.
Given a type a, it is possible to create objects of type M a (IO a in the case of the IO monad), using the return function.
return, therefore, has type a -> M a. Moreover, return attempts not to change the element that it is passed -- if you call return x, you will get a wrappered version of x that contains all of the information of x (Theoretically, at least. This doesn't happen with, for example, the empty monad.)
For example, return "x" will yield an M Char. This is how getChar works -- it yields an IO Char using a return statement, which is then pulled out of its wrapper with <-.
>>=, read as 'bind', is more complicated. It has type M a -> (a -> M b) -> M b, and its role is to take a 'wrappered' object, and a function from the underlying type of that object to another 'wrappered' object, and apply that function to the underlying variable in the first input.
For example, (return 5) >>= (return . (+ 3)) will yield an M Int, which will be the same M Int that would be given by return 8. In this way, any function that can be applied outside of a monad can also be applied inside of it.
To do this, one could take an arbitrary function f :: a -> b, and give the new function g :: M a -> M b as follows:
g x = x >>= (return . f)
Now, for something to be a monad, these operations must also have certain relations -- their definitions as above aren't quite enough.
First: (return x) >>= f must be equivalent to f x. That is, it must be equivalent to perform an operation on x whether it is 'wrapped' in the monad or not.
Second: x >>= return must be equivalent to m. That is, if an object is unwrapped by bind, and then rewrapped by return, it must return to its same state, unchanged.
Third, and finally (x >>= f) >>= g must be equivalent to x >>= (\y -> (f y >>= g) ). That is, function binding is associative (sort of). More accurately, if two functions are bound successively, this must be equivalent to binding the combination thereof.
Now, while this is how monads work, it's not how it's most commonly used, because of the syntactic sugar of do and <-.
Essentially, do begins a long chain of binds, and each <- sort of creates a lambda function that gets bound.
For example,
a = do x <- something
y <- function x
return y
is equivalent to
a = something >>= (\x -> (function x) >>= (\y -> return y))
In both cases, something is bound to x, function x is bound to y, and then y is returned to a in the wrapper of the relevant monad.
Sorry for the wall of text, and I hope it explains something. If there's more you need cleared up about this, or something in this explanation is confusing, just ask.
This is a very good question, if you ask me. I remember being very confused about this too, maybe this will help...
'IO' is a type constructor, 'IO a' is a type, the 'a' (in 'IO a') is an type variable. The letter 'a' carries no significance, the letter 'b' or 't1' could have been used just as well.
If you look at the definition of the IO type constructor you will see that it is a newtype defined as: GHC.Types.IO (GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))
'f :: IO a' is the type of a function called 'f' of apparently no arguments that returns a result of some unconstrained type in the IO monad. 'in the IO monad' means that f can do some IO (i.e. change the 'RealWorld', where 'change' means replace the provided RealWorld with a new one) while computing its result. The result of f is polymorphic (that's a type variable 'a' not a type constant like 'Int'). A polymorphic result means that in your program it's the caller that determines the type of the result, so used in one place f could return an Int, used in another place it could return a String. 'Unconstrained' means that there's no type class restricting what type can be returned and so any type can be returned.
Why is 'f' a function and not a constant since there are no parameters and Haskell is pure? Because the definition of IO means that 'f :: IO a' could have been written 'f :: GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #)' and so in fact has a parameter -- the 'state of the real world'.
In the data IO a a have mainly the same meaning as in Maybe a.
But we can't rid of a constructor, like:
fromIO :: IO a -> a
fromIO (IO a) = a
Fortunately we could use this data in Monads, like:
{-# LANGUAGE ScopedTypeVariables #-}
foo = do
(fromIO :: a) <- (dataIO :: IO a)
...

Resources