Can I declare a NULL value in Haskell? - haskell

Just curious, seems when declaring a name, we always specify some valid values, like let a = 3. Question is, in imperative languages include c/java there's always a keyword of "null". Does Haskell has similar thing? When could a function object be null?

There is a “null” value that you can use for variables of any type. It's called ⟂ (pronounced bottom). We don't need a keyword to produce bottom values; actually ⟂ is the value of any computation which doesn't terminate. For instance,
bottom = let x = x in x -- or simply `bottom = bottom`
will infinitely loop. It's obviously not a good idea to do this deliberately, however you can use undefined as a “standard bottom value”. It's perhaps the closest thing Haskell has to Java's null keyword.
But you definitely shouldn't/can't use this for most of the applications where Java programmers would grab for null.
Since everything in Haskell is immutable, a value that's undefined will always stay undefined. It's not possible to use this as a “hold on a second, I'll define it later” indication†.
It's not possible to check whether a value is bottom or not. For rather deep theoretical reasons, in fact. So you can't use this for values that may or may not be defined.
And you know what? It's really good that Haskell does't allow this! In Java, you constantly need to be wary that values might be null. In Haskell, if a value is bottom than something is plain broken, but this will never be part of intended behaviour / something you might need to check for. If for some value it's intended that it might not be defined, then you must always make this explicit by wrapping the type in a Maybe. By doing this, you make sure that anybody trying to use the value must first check whether it's there. Not possible to forget this and run into a null-reference exception at runtime!
And because Haskell is so good at handling variant types, checking the contents of a Maybe-wrapped value is really not too cumbersome. You can just do it explicitly with pattern matching,
quun :: Int -> String
quun i = case computationWhichMayFail i of
Just j -> show j
Nothing -> "blearg, failed"
computationWhichMayFail :: Int -> Maybe Int
or you can use the fact that Maybe is a functor. Indeed it is an instance of almost every specific functor class: Functor, Applicative, Alternative, Foldable, Traversable, Monad, MonadPlus. It also lifts semigroups to monoids.
Dᴏɴ'ᴛ Pᴀɴɪᴄ now,
you don't need to know what the heck these things are. But when you've learned what they do, you will be able to write very concise code that automagically handles missing values always in the right way, with zero risk of missing a check.
†Because Haskell is lazy, you generally don't need to defer any calculations to be done later. The compiler will automatically see to it that the computation is done when it's necessary, and no sooner.

There is no null in Haskell. What you want is the Maybe monad.
data Maybe a
= Just a
| Nothing
Nothing refers to classic null and Just contains a value.
You can then pattern match against it:
foo Nothing = Nothing
foo (Just a) = Just (a * 10)
Or with case syntax:
let m = Just 10
in case m of
Just v -> print v
Nothing -> putStrLn "Sorry, there's no value. :("
Or use the supperior functionality provided by the typeclass instances for Functor, Applicative, Alternative, Monad, MonadPlus and Foldable.
This could then look like this:
foo :: Maybe Int -> Maybe Int -> Maybe Int
foo x y = do
a <- x
b <- y
return $ a + b
You can even use the more general signature:
foo :: (Monad m, Num a) => m a -> m a -> m a
Which makes this function work for ANY data type that is capable of the functionality provided by Monad. So you can use foo with (Num a) => Maybe a, (Num a) => [a], (Num a) => Either e a and so on.

Haskell does not have "null". This is a design feature. It completely prevents any possibility of your code crashing due to a null-pointer exception.
If you look at code written in an imperative language, 99% of the code expects stuff to never be null, and will malfunction catastrophically if you give it null. But then 1% of the code does expect nulls, and uses this feature to specify optional arguments or whatever. But you can't easily tell, by looking at the code, which parts are expecting nulls as legal arguments, and which parts aren't. Hopefully it's documented — but don't hold your breath!
In Haskell, there is no null. If that argument is declared as Customer, then there must be an actual, real Customer there. You can't just pass in a null (intentionally or by mistake). So the 99% of the code that is expecting a real Customer will always work.
But what about the other 1%? Well, for that we have Maybe. But it's an explicit thing; you have to explicitly say "this value is optional". And you have to explicitly check when you use it. You cannot "forget" to check; it won't compile.
So yes, there is no "null", but there is Maybe which is kinda similar, but safer.

Not in Haskell (or in many other FP languages). If you have some expression of some type T, its evaluation will give a value of type T, with the following exceptions:
infinite recursion may make the program "loop forever" and failing to return anything
let f n = f (n+1) in f 0
runtime errors can abort the program early, e.g.:
division by zero, square root of negative, and other numerical errors
head [], fromJust Nothing, and other partial functions used on invalid inputs
explicit calls to undefined, error "message", or other exception-throwing primitives
Note that even if the above cases might be regarded as "special" values called "bottoms" (the name comes from domain theory), you can not test against these values at runtime, in general. So, these are not at all the same thing as Java's null. More precisely, you can't write things like
-- assume f :: Int -> Int
if (f 5) is a division-by-zero or infinite recursion
then 12
else 4
Some exceptional values can be caught in the IO monad, but forget about that -- exceptions in Haskell are not idiomatic, and roughly only used for IO errors.
If you want an exceptional value which can be tested at run-time, use the Maybe a type, as #bash0r already suggested. This type is similar to Scala's Option[A] or Java's not-so-much-used Optional<A>.
The value is having both a type T and type Maybe T is to be able to precisely identify which functions always succeed, and which ones can fail. In Haskell the following is frowned upon, for instance:
-- Finds a value in a list. Returns -1 if not present.
findIndex :: Eq a => [a] -> a -> Int
Instead this is preferred:
-- Finds a value in a list. Returns Nothing if not present.
findIndex :: Eq a => [a] -> a -> Maybe Int
The result of the latter is less convenient than the one of the former, since the Int must be unwrapped at every call. This is good, since in this way each user of the function is prevented to simply "ignore" the not-present case, and write buggy code.

Related

Sum types - Why in Haskell is `show (Int | Double)` different than `(show Int) | (show Double)`

Why are these not equivalent?
show $ if someCondition then someInt else some double
and
if someCondition then show someInt else show someDouble
I understand that if you isolate the if ... else part in the first example to an expression by itself then you can't represent its type with an anonymous sum type, the kind of Int | Double, like something you could do easily in TypeScript (mentioning TypeScript because it is the langauge I used often and that supports Sum types), and would have to resort to using the Either data then based on it would call show.
The example I gave here is trivial but to me it makes more sense to think "Okay we are going to show something, and that something depends on someCondition" rather than "Okay if someCondition is true then show someInt otherwise show someDouble", and also allows for less code duplication (here the show is repeated twice but it could also be a long function application and instead of an if ... else there could be >2 branches to consider)
In my mind it should be easy for the compiler to check if each of the types that make the sum type (here Int | Double) could be used as a parameter to show function and decides if the types are correct or not. Even better is that show function always returns a string no matter the types of the parameters, so the compiler doesn't have to carry with it all the possible "branches" (so all the possible types).
Is it by choice that such a feature doesn't exist? Or is implementing it harder that I think?
All parts of an expression must be well-typed. The type of if someCondition then someInt else someDouble would have to be something like exists a. Show a => a, but Haskell doesn't support that kind of existential quantification.
Update: As chi points out in a comment, this would also be possible if Haskell had support for union/intersection types (which are not the same as sum/product types), but it unfortunately doesn't.
There are product types with lightweight syntax, written (,), in Haskell. One would thing that a sum type with a lightweight syntax, something like (Int | String), would be a great idea. The reality is more complicated. Let's see why (I'm taking some liberties with Num, they are not important).
if someCondition then 42 else "helloWorld"
If this should return a value of type like (Int | String), then what should the following return?
if someCondition then 42 else 0
(Int | Int) obviously, but if this is distinct from plain Int then we're in deep trouble. So (Int | Int) should be identical to plain Int.
One can immediately see that this is not just lightweight syntax for sum types, but a wholly new language feature. A different kind of type system if you will. Should we have one?
Let's look at this function.
mysteryType x a b = if x then a else b
Now what type does mysteryType have? Obviously
mysteryType :: Bool -> a -> b -> (a|b)
right? Now what if a and b are the same type?
let x = mysteryType True 42 0
This should be plain Int as we have agreed previously. Now mysteryType sometimes return an anonymous sum type, and sometimes it does not, depending on what arguments you pass. How would you pattern match such an expression? What on Earth can you do with it? Except trivial things like "show" (or whatever methods of other type-classes it would be an instance of), not a whole lot. Unless you add run-time type information to the language, that is, so typeof is available — and that make Haskell an entirely different language.
So yeah. Why isn't Haskell a TypeScript? Because we don't need another TypeScript. If you want TypeScript, you know where to find it.

How does return statement work in Haskell? [duplicate]

This question already has answers here:
What's so special about 'return' keyword
(3 answers)
Closed 5 years ago.
Consider these functions
f1 :: Maybe Int
f1 = return 1
f2 :: [Int]
f2 = return 1
Both have the same statement return 1. But the results are different. f1 gives value Just 1 and f2 gives value [1]
Looks like Haskell invokes two different versions of return based on return type. I like to know more about this kind of function invocation. Is there a name for this feature in programming languages?
This is a long meandering answer!
As you've probably seen from the comments and Thomas's excellent (but very technical) answer You've asked a very hard question. Well done!
Rather than try to explain the technical answer I've tried to give you a broad overview of what Haskell does behind the scenes without diving into technical detail. Hopefully it will help you to get a big picture view of what's going on.
return is an example of type inference.
Most modern languages have some notion of polymorphism. For example var x = 1 + 1 will set x equal to 2. In a statically typed language 2 will usually be an int. If you say var y = 1.0 + 1.0 then y will be a float. The operator + (which is just a function with a special syntax)
Most imperative languages, especially object oriented languages, can only do type inference one way. Every variable has a fixed type. When you call a function it looks at the types of the argument and chooses a version of that function that fits the types (or complains if it can't find one).
When you assign the result of a function to a variable the variable already has a type and if it doesn't agree with the type of the return value you get an error.
So in an imperative language the "flow" of type deduction follows time in your program Deduce the type of a variable, do something with it and deduce the type of the result. In a dynamically typed language (such as Python or javascript) the type of a variable is not assigned until the value of the variable is computed (which is why there don't seem to be types). In a statically typed language the types are worked out ahead of time (by the compiler) but the logic is the same. The compiler works out what the types of variables are going to be, but it does so by following the logic of the program in the same way as the program runs.
In Haskell the type inference also follows the logic of the program. Being Haskell it does so in a very mathematically pure way (called System F). The language of types (that is the rules by which types are deduced) are similar to Haskell itself.
Now remember Haskell is a lazy language. It doesn't work out the value of anything until it needs it. That's why it makes sense in Haskell to have infinite data structures. It never occurs to Haskell that a data structure is infinite because it doesn't bother to work it out until it needs to.
Now all that lazy magic happens at the type level too. In the same way that Haskell doesn't work out what the value of an expression is until it really needs to, Haskell doesn't work out what the type of an expression is until it really needs to.
Consider this function
func (x : y : rest) = (x,y) : func rest
func _ = []
If you ask Haskell for the type of this function it has a look at the definition, sees [] and : and deduces that it's working with lists. But it never needs to look at the types of x and y, it just knows that they have to be the same because they end up in the same list. So it deduces the type of the function as [a] -> [a] where a is a type that it hasn't bothered to work out yet.
So far no magic. But it's useful to notice the difference between this idea and how it would be done in an OO language. Haskell doesn't convert the arguments to Object, do it's thing and then convert back. Haskell just hasn't been asked explicitly what the type of the list is. So it doesn't care.
Now try typing the following into ghci
maxBound - length ""
maxBound : "Hello"
Now what just happened !? minBound bust be a Char because I put it on the front of a string and it must be an integer because I added it to 0 and got a number. Plus the two values are clearly very different.
So what is the type of minBound? Let's ask ghci!
:type minBound
minBound :: Bounded a => a
AAargh! what does that mean? Basically it means that it hasn't bothered to work out exactly what a is, but is has to be Bounded if you type :info Bounded you get three useful lines
class Bounded a where
minBound :: a
maxBound :: a
and a lot of less useful lines
So if a is Bounded there are values minBound and maxBound of type a.
In fact under the hood Bounded is just a value, it's "type" is a record with fields minBound and maxBound. Because it's a value Haskell doesn't look at it until it really needs to.
So I appear to have meandered somewhere in the region of the answer to your question. Before we move onto return (which you may have noticed from the comments is a wonderfully complex beast.) let's look at read.
ghci again
read "42" + 7
read "'H'" : "ello"
length (read "[1,2,3]")
and hopefully you won't be too surprised to find that there are definitions
read :: Read a => String -> a
class Read where
read :: String -> a
so Read a is just a record containing a single value which is a function String -> a. Its very tempting to assume that there is one read function which looks at a string, works out what type is contained in the string and returns that type. But it does the opposite. It completely ignores the string until it's needed. When the value is needed, Haskell first works out what type it's expecting, once it's done that it goes and gets the appropriate version of the read function and combines it with the string.
now consider something slightly more complex
readList :: Read a => [String] -> a
readList strs = map read strs
under the hood readList actually takes two arguments
readList' (Read a) -> [String] -> [a]
readList' {read = f} strs = map f strs
Again as Haskell is lazy it only bothers looking at the arguments when it's needs to find out the return value, at that point it knows what a is, so the compiler can go and fine the right version of Read. Until then it doesn't care.
Hopefully that's given you a bit of an idea of what's happening and why Haskell can "overload" on the return type. But it's important to remember it's not overloading in the conventional sense. Every function has only one definition. It's just that one of the arguments is a bag of functions. read_str doesn't ever know what types it's dealing with. It just knows it gets a function String -> a and some Strings, to do the application it just passes the arguments to map. map in turn doesn't even know it gets strings. When you get deeper into Haskell it becomes very important that functions don't know very much about the types they're dealing with.
Now let's look at return.
Remember how I said that the type system in Haskell was very similar to Haskell itself. Remember that in Haskell functions are just ordinary values.
Does this mean I can have a type that takes a type as an argument and returns another type? Of course it does!
You've seen some type functions Maybe takes a type a and returns another type which can either be Just a or Nothing. [] takes a type a and returns a list of as. Type functions in Haskell are usually containers. For example I could define a type function BinaryTree which stores a load of a's in a tree like structure. There are of course lots of much stranger ones.
So, if these type functions are similar to ordinary types I can have a typeclass that contains type functions. One such typeclass is Monad
class Monad m where
return a -> m a
(>>=) m a (a -> m b) -> m b
so here m is some type function. If I want to define Monad for m I need to define return and the scary looking operator below it (which is called bind)
As others have pointed out the return is a really misleading name for a fairly boring function. The team that designed Haskell have since realised their mistake and they're genuinely sorry about it. return is just an ordinary function that takes an argument and returns a Monad with that type in it. (You never asked what a Monad actually is so I'm not going to tell you)
Let's define Monad for m = Maybe!
First I need to define return. What should return x be? Remember I'm only allowed to define the function once, so I can't look at x because I don't know what type it is. I could always return Nothing, but that seems a waste of a perfectly good function. Let's define return x = Just x because that's literally the only other thing I can do.
What about the scary bind thing? what can we say about x >>= f? well x is a Maybe a of some unknown type a and f is a function that takes an a and returns a Maybe b. Somehow I need to combine these to get a Maybe b`
So I need to define Nothing >== f. I can't call f because it needs an argument of type a and I don't have a value of type a I don't even know what 'a' is. I've only got one choice which is to define
Nothing >== f = Nothing
What about Just x >>= f? Well I know x is of type a and f takes a as an argument, so I can set y = f a and deduce that y is of type b. Now I need to make a Maybe b and I've got a b so ...
Just x >>= f = Just (f x)
So I've got a Monad! what if m is List? well I can follow a similar sort of logic and define
return x = [x]
[] >>= f = []
(x : xs) >>= a = f x ++ (xs >>= f)
Hooray another Monad! It's a nice exercise to go through the steps and convince yourself that there's no other sensible way of defining this.
So what happens when I call return 1?
Nothing!
Haskell's Lazy remember. The thunk return 1 (technical term) just sits there until someone needs the value. As soon as Haskell needs the value it know what type the value should be. In particular it can deduce that m is List. Now that it knows that Haskell can find the instance of Monad for List. As soon as it does that it has access to the correct version of return.
So finally Haskell is ready To call return, which in this case returns [1]!
The return function is from the Monad class:
class Applicative m => Monad (m :: * -> *) where
...
return :: a -> m a
So return takes any value of type a and results in a value of type m a. The monad, m, as you've observed is polymorphic using the Haskell type class Monad for ad hoc polymorphism.
At this point you probably realize return is not an good, intuitive, name. It's not even a built in function or a statement like in many other languages. In fact a better-named and identically-operating function exists - pure. In almost all cases return = pure.
That is, the function return is the same as the function pure (from the Applicative class) - I often think to myself "this monadic value is purely the underlying a" and I try to use pure instead of return if there isn't already a convention in the codebase.
You can use return (or pure) for any type that is a class of Monad. This includes the Maybe monad to get a value of type Maybe a:
instance Monad Maybe where
...
return = pure -- which is from Applicative
...
instance Applicative Maybe where
pure = Just
Or for the list monad to get a value of [a]:
instance Applicative [] where
{-# INLINE pure #-}
pure x = [x]
Or, as a more complex example, Aeson's parse monad to get a value of type Parser a:
instance Applicative Parser where
pure a = Parser $ \_path _kf ks -> ks a

Opposite of Maybe data type

So the Maybe data type is defined like this:
data Maybe a = Just a | Nothing
What would you call the data type that's conceptually opposite to Maybe:
data <Type> = Okay | Error String
That is, a type that declares the computation successful or holds some error produced by the computation.
I refute the premise that this type is in a meaningful sense “opposite” to Maybe. I also don't agree that Either should generally be understood as an error-signalling type – that's just quite a natural way of using it, due to the way its monad instance works.
Both Maybe and Either know nothing about errors / failure – they're just implementations of the abstract concept of a sum type (in the case of Maybe a sum with the unit type).
IMO, you should just use Maybe String for this purpose, or if you like it explicit:
type ErrorMsg = String
type PossibleError = Maybe ErrorMsg
I would use Either String (), using the common convention that an Either is used to signal errors with information in the Left and the success value in the Right. If you don't actually have a success value, use the unit type ().
Of course, that still needs to be wrapped in some monad, because in Haskell a pure function without a result is not useful. If the purpose of your function is just to check the validity of some data, then it's not an error to return the error string, and I'd go back to using Maybe.
Your <Type> is equivalent to Either String (), so you could just have
type CanError = Either String ()
isOkay :: CanError -> Bool
isOkay = Data.Either.isRight
isError :: CanError -> Bool
isError = Data.Either.isLeft
getErrorMsg :: CanError -> Maybe String
getErrorMsg (Left msg) = Just msg
getErrorMsg _ = Nothing
You can use Either String as a Monad/Applicative/Functor, but not CanError since it has kind *, not * -> * as required by each of those typeclasses. I would recommend just using Either String as is since you get the extra power of Monad/Applicative/Functor/etc., and when you need the equivalent of CanError just have the return type be Either String () in-line.
Another example for what perhaps is the general idea behind your question, that success is the exceptional case and failure is the normal case, is EitherR as provided in the errors package whose Monad instance is suggestively referred to as the "success" monad. As the name suggests there's no magic here, it's just a newtype with Monad instances swapped around. The interpretation, however, is interesting.
You can program in a world where success falls through while errors are kept around. As the package documentation indicates, this comes in handy when dealing with stacks of exception handlers.

How to extract value from monadic action

Is there a built-in function with signature :: (Monad m) => m a -> a ?
Hoogle tells that there is no such function.
Can you explain why?
A monad only supplies two functions:
return :: Monad m => a -> m a
(>>=) :: Monad m => m a -> (a -> m b) -> m b
Both of these return something of type m a, so there is no way to combine these in any way to get a function of type Monad m => m a -> a. To do that, you'll need more than these two functions, so you need to know more about m than that it's a monad.
For example, the Identity monad has runIdentity :: Identity a -> a, and several monads have similar functions, but there is no way to provide it generically. In fact, the inability to "escape" from the monad is essential for monads like IO.
There is probably a better answer than this, but one way to see why you cannot have a type (Monad m) => m a -> a is to consider a null monad:
data Null a = Null
instance Monad Null where
return a = Null
ma >>= f = Null
Now (Monad m) => m a -> a means Null a -> a, ie getting something out of nothing. You can't do that.
This doesn't exist because Monad is a pattern for composition, not a pattern for decomposition. You can always put more pieces together with the interface it defines. It doesn't say a thing about taking anything apart.
Asking why you can't take something out is like asking why Java's Iterator interface doesn't contain a method for adding elements to what it's iterating over. It's just not what the Iterator interface is for.
And your arguments about specific types having a kind of extract function follows in the exact same way. Some particular implementation of Iterator might have an add function. But since it's not what Iterators are for, the presence that method on some particular instance is irrelevant.
And the presence of fromJust is just as irrelevant. It's not part of the behavior Monad is intended to describe. Others have given lots of examples of types where there is no value for extract to work on. But those types still support the intended semantics of Monad. This is important. It means that Monad is a more general interface than you are giving it credit for.
Suppose there was such a function:
extract :: Monad m => m a -> a
Now you could write a "function" like this:
appendLine :: String -> String
appendLine str = str ++ extract getLine
Unless the extract function was guaranteed never to terminate, this would violate referential transparency, because the result of appendLine "foo" would (a) depend on something other than "foo", (b) evaluate to different values when evaluated in different contexts.
Or in simpler words, if there was an actually useful extract operation Haskell would not be purely functional.
Is there a build-in function with signature :: (Monad m) => m a -> a ?
If Hoogle says there isn't...then there probably isn't, assuming your definition of "built in" is "in the base libraries".
Hoogle tells that there is no such function. Can you explain why?
That's easy, because Hoogle didn't find any function in the base libraries that matches that type signature!
More seriously, I suppose you were asking for the monadic explanation. The issues are safety and meaning. (See also my previous thoughts on magicMonadUnwrap :: Monad m => m a -> a)
Suppose I tell you I have a value which has the type [Int]. Since we know that [] is a monad, this is similar to telling you I have a value which has the type Monad m => m Int. So let's suppose you want to get the Int out of that [Int]. Well, which Int do you want? The first one? The last one? What if the value I told you about is actually an empty list? In that case, there isn't even an Int to give you! So for lists, it is unsafe to try and extract a single value willy-nilly like that. Even when it is safe (a non-empty list), you need a list-specific function (for example, head) to clarify what you mean by desiring f :: [Int] -> Int. Hopefully you can intuit from here that the meaning of Monad m => m a -> a is simply not well defined. It could hold multiple meanings for the same monad, or it could mean absolutely nothing at all for some monads, and sometimes, it's just simply not safe.
Because it may make no sense (actually, does make no sense in many instances).
For example, I might define a Parser Monad like this:
data Parser a = Parser (String ->[(a, String)])
Now there is absolutely no sensible default way to get a String out of a Parser String. Actually, there is no way at all to get a String out of this with just the Monad.
There is a useful extract function and some other functions related to this at http://hackage.haskell.org/package/comonad-5.0.4/docs/Control-Comonad.html
It's only defined for some functors/monads and it doesn't necessarily give you the whole answer but rather gives an answer. Thus there will be possible subclasses of comonad that give you intermediate stages of picking the answer where you could control it. Probably related to the possible subclasses of Traversable. I don't know if such things are defined anywhere.
Why hoogle doesn't list this function at all appears to be because the comonad package isn't indexed otherwise I think the Monad constraint would be warned and extract would be in the results for those Monads with a Comonad instance. Perhaps this is because the hoogle parser is incomplete and fails on some lines of code.
My alternative answers:
you can perform a - possibly recursive - case analysis if you've imported the type's constructors
You can slink your code that would use the extracted values into the monad using monad >>= \a -> return $ your code uses a here as an alternative code structure and as long as you can convert the monad to "IO ()" in a way that prints your outputs you're done. This doesn't look like extraction but maths isn't the same as the real world.
Well, technicaly there is unsafePerformIO for the IO monad.
But, as the name itself suggests, this function is evil and you should only use it if you really know what you are doing (and if you have to ask wether you know or not then you don't)

determining function behavior from the type of the function

New to Haskell so sorry if this is very basic
This example is taken from "Real World Haskell" -
ghci> :type fst
fst :: (a, b) -> a
They show the type of the fst function and then follow it with this paragraph...
"The result type of fst is a. We've already mentioned that parametric polymorphism makes the real type inaccessible: fst doesn't have enough information to construct a value of type a, nor can it turn an a into a b. So the only possible valid behaviour (omitting infinite loops or crashes) it can have is to return the first element of the pair."
I feel like I am missing the fundamental point of the paragraph, and perhaps something important about Haskell. Why couldn't the fst function return type b? Why couldn't it take the tuple as a param, but simply return an Int ( or any other type that is NOT a)? I don't understand why it MUST return type a?
Thanks
If it did any of those things, its type would change. What the quote is saying is that, given that we know fst is of type (a, b) -> a, we can make those deductions about it. If it had a different type, we would not be able to do so.
For instance, see that
snd :: (a, b) -> a
snd (x, y) = y
does not type-check, and so we know a value of type (a, b) -> a cannot behave like snd.
Parametricity is basically the fact that a polymorphic function of a certain type must obey certain laws by construction — i.e., there is no well-typed expression of that type that does not obey them. So, for it to be possible to prove things about fst with it, we must first know fst's type.
Note especially the word polymorphism there: we can't do the same kind of deductions about non-polymorphic types. For instance,
myFst :: (Int, String) -> Int
myFst (a, b) = a
type-checks, but so does
myFst :: (Int, String) -> Int
myFst (a, b) = 42
and even
myFst :: (Int, String) -> Int
myFst (a, b) = length b
Parametricity relies crucially on the fact that a polymorphic function can't "look" at the types it is called with. So the only value of type a that fst knows about is the one it's given: the first element of the tuple.
The point is that once you have that type, the implementation options are greatly limited. If you returned an Int, then your type would be (a,b) -> Int. Since a could be anything, we can't gin one up out of thin air in the implementation without resorting to undefined, and so must return the one given to us by the caller.
You should read the Theorems for Free article.
Let's try to add some more hand-waving to that already given by Real World Haskell. Lets try to convince ourselves that given that we have a function fst with type (a,b) -> a the only total function it can be is the following one:
fst (x,y) = x
First of all, we cannot return anything other then a value of type a, that is in the premise that fst has type (a,b) -> a, so we cannot have fst (x,y) = y or fst (x,y) = 1 because that does not have the correct type.
Now, as RWH says, if I give fst an (Int,Int), fst doesn't know these are Ints, furthermore, a or b are not required to belong to any type class so fst has no available values or functions associated with a or b.
So fst only knows about the a value and the b value that I give it and I can't turn b into an a (It can't make a b -> a function) so it must return the given a value.
This isn't actually just magical hand waving, one can actually deduce what possible expressions there are of a given polymorphic type. There is actually a program called djinn that does exactly that.
The point here is that both a and b are type variables (that might be the same, but that's not needed). Anyway, since for a given tuple of two elements, fst returns always the first element, the returned type must be always the same as the type for the first element.
The fundamental thing you're probably missing is this:
In most programming languages, if you say "this function returns any type", it means that the function can decide what type of value it actually returns.
In Haskell, if you say "this function returns any type", it means that the caller gets to decide what type that should be. (!)
So if I you write foo :: Int -> x, it can't just return a String, because I might not ask it for a String. I might ask for a Customer, or a ThreadId, or anything.
Obviously, there's no way that foo can know how to create a value of every possible type, even types that don't exist yet. In short, it is impossible to write foo. Everything you try will give you type errors, and won't compile.
(Caveat: There is a way to do it. foo could loop forever, or throw an exception. But it cannot return a valid value.)
There's no way for a function to be able to create values of any possible type. But it's perfectly possible for a function to move data around without caring what type it is. Therefore, if you see a function that accepts any kind of data, the only thing it can be doing with it is to move it around.
Alternatively, if the type has to belong to a specific class, then the function can use the methods of that class on it. (Could. It doesn't have to, but it can if it wants to.)
Fundamentally, this is why you can actually tell what a function does just by looking at its type signature. The type signature tells you what the function "knows" about the data it's going to be given, and hence what possible operations it might perform. This is why searching for a Haskell function by its type signature is so damned useful.
You've heard the expression "in Haskell, if it compiles, it usually works right"? Well, this is why. ;-)

Resources