Functional Dependency in Haskell - haskell

I cannot really get it. Why do we need it at all? I mean if I use the same type parameter, I think that means they should be the same type.
I heard it can help the compiler to avoid the infinite loop. Can someone tell me some more details about that?
In the end, are there any 'patterns and practices' we should follow on the usage of functional dependency in Real World Haskell?
[Follow-up Question]
class Extract container element where
extract :: container -> element
instance Extract (a,b) a where
extract (x,_) = x
In the code above, I used the same type variable 'a' for both container and element, I think the compiler can thus infer that these two types are the same type.
But when I tried this code in GHCi, I got the following feedback:
*Main> extract('x',3)
<interactive>:1:0:
No instance for (Extract (Char, t) element)
arising from a use of `extract' at <interactive>:1:0-13
Possible fix:
add an instance declaration for (Extract (Char, t) element)
In the expression: extract ('x', 3)
In the definition of `it': it = extract ('x', 3)
When one of them has been specified to be type 'Char', why the other one is still unresolved type 'element'?

I thought this explains it fairly well. So basically if you have an FD relation of a -> b all it means is for type-class instance there can only be one 'b' with any 'a' so Int Int but you can't have Int Float as well. That's what they mean when it's said that 'b' is uniquely determined from 'a'. This extends to any number of type paramters. The reason why it is needed is 1. Type inference 2. Sometimes you want a constraint like that.
An alternative to FDs is type families extension but not for all cases of FDs.

Related

What does a stand for in a data type declaration?

Normally when using type declarations we do:
function_name :: Type -> Type
However in an exercise I am trying to solve there is the following structure:
function_name :: Type a -> Type a
or explicitly as in the exercise
alphabet :: DFA a -> Alphabet a
alphabet = undefined
What does a stand for?
Short answer: it's a type variable.
At the computation level, the way we define functions is to use variables to refer to their arguments. Like this:
f x = x + 3
Here x is a variable, and its value will be chosen when the function is called. Haskell has a similar (but not identical...) mechanism in its type sublanguage. For example, you can write things like:
type F x = (x, Int, x)
type Endo a = a -> a -> a
Here again x is a variable in the first one (and a in the second), and its value will be chosen at use sites. One can also use this mechanism when defining new types. (The previous two examples just give new names to existing types, but the following does more.) One of the most basic nontrivial examples of this is the Maybe family of types:
data Maybe a = Nothing | Just a
The things on the right of the = are computation-level, so you can mostly ignore them for now, but on the left we are declaring a new family of types Maybe which accepts other types as an argument. For example, Maybe Int, Maybe (Bool, String), Maybe (Endo Char), and even passing in expressions that have variables like Maybe (x, Int, x) are all possible.
Syntactically, type constructors (things which are defined as part of the program text and that we expect the compiler to look up the definition for) start with an upper case letter and type variables (things which will be instantiated later and so don't currently have a concrete definition) start with lower case letters.
So, in the type signature you showed:
alphabet :: DFA a -> Alphabet a
I suspect there are actually two constructs new to you, not just one: first, the type variable a that you asked about, and second, the concept of type application, where we apply at the type level one "function-like" type to another. (Outside of this answer, people say "parameterized" instead of "function-like".)
...and, believe it or not, there is even a type system for types that makes sure you don't write things like these:
Int a -- Int is not parameterized, so shouldn't be applied to arguments
Int Char -- ditto
Maybe -> String -- Maybe is parameterized, so should be applied to
-- arguments, but isn't

Data declaration with no data constructor. Can it be instantiated? Why does it compile?

Reading one of my Haskell books, I came across the sentence:
Data declarations always create a new type constructor, but may or may not create new data constructors.
It sounded strange that one should be able to declare a data type with no data constructor, because it seems that then one could never instantiate the type. So I tried it out. The following data declaration compiles without error.
data B = String
How would one create an instance of this type? Is it possible? I cannot seem to find a way.
I thought maybe a data constructor with name matching the type constructor would be created automatically, but that does not appear to be the case, as shown by the error resulting from attempting to use B as a data constructor with the declaration in scope.
Prelude> data B = String deriving Show
Prelude> B
<interactive>:129:1: error: Data constructor not in scope: B
Why is this data declaration permitted to compile if the type can never be instantiated?
Is it permitted solely for some formal reason despite not having a known practical application?
I also wonder whether my book's statement about data types with no constructor might be referring to types declared via the type or newtype keywords instead of by data.
In the type case, type synonyms clearly do not use data
constructors, as illustrated by the following.
Prelude> type B = String
Prelude>
Type synonyms such as this can be instantiated by the constructors of
the type they are set to. But I am not convinced that this is what my
book is referring to as type synonyms do not seem to be declaring a new data type as much as
simply defining an new alias for an existing type.
In the newtype case, it appears that types without data
constructors cannot be created, as shown by the following error.
Prelude> newtype B = String
<interactive>:132:13: error:
• The constructor of a newtype must have exactly one field
but ‘String’ has none
• In the definition of data constructor ‘String’
In the newtype declaration for ‘B’
type and newtype do not appear to be what the book is referring to, which brings me back to my original question: why it is possible to declare a type using data with no data constructor?
How would one create an instance of this type?
The statement from your book is correct, but your example is not. data B = String defines a type constructor B and a data constructor String, both taking no arguments. Note that the String you define is in the value namespace, so is different from the usual String type constructor.
ghci> data B = String
ghci> x = String
ghci> :t x
x :: B
However, here is an example of a data definition without data constructors (so it cannot be instantiated).
ghci> data B
Now, I have a new type constructor B, but no data constructors to produce values of type B. In fact, such a data type is declared in the Haskell base: it is called Void:
ghci> import Data.Void
ghci> :i Void
data Void -- Defined in ‘Data.Void’
Why is this data declaration permitted to compile if the type can never be instantiated?
Being able to have uninhabited types turns out to be useful in a handful of places. The examples that I can think of right now are mostly passing in such a type as a type parameter to another type constructor. One more practical use case is in streaming libraries like conduit.
There is a ConduitM i o m r type constructor where: i is the type of the input stream elements, o the type of the output stream elements, m the monad in which actions are performed, r is the final result produced at the end.
Then, it defines a Sink as
type Sink i m r = ConduitM i Void m r
since a Sink should never output any values. Void is a compile time guarantee that Sink cannot output any (non-bottom) values.
Much like Identity, Void is mostly useful in conjunction with other abstractions.
... type synonyms clearly do not use data constructors
Yes, but they are not defining type constructors either. Synonyms are just some surface-level convenience renaming. Under the hood, nothing new is defined.
In the newtype case, it appears that types without data constructors cannot be created, as shown by the following error.
I suggest you look up what newtype is for. The whole point of newtype is to provide a zero-cost wrapper around an existing type. That means you have one and exactly one constructor taking one and exactly one argument (the wrapped value). At compile time, the wrapping and unwrapping operations become NOPs.

GHC doesn't pick the only available instance

I'm trying to write a CSS DSL in Haskell, and keep the syntax as close to CSS as possible. One difficulty is that certain terms can appear both as a property and value. For example flex: you can have "display: flex" and "flex: 1" in CSS.
I've let myself inspire by the Lucid API, which overrides functions based on the function arguments to generate either attributes or DOM nodes (which sometimes also share names, eg <style> and <div style="...">).
Anyway, I've ran into a problem that GHC fails to typecheck the code (Ambiguous type variable), in a place where it is supposed to pick one of the two available typeclass instances. There is only one instance which fits (and indeed, in the type error GHC prints "These potential instance exist:" and then it lists just one). I'm confused that given the choice of a single instance, GHC refuses to use it. Of course, if I add explicit type annotations then the code compiles. Full example below (only dependency is mtl, for Writer).
{-# LANGUAGE FlexibleInstances #-}
module Style where
import Control.Monad.Writer.Lazy
type StyleM = Writer [(String, String)]
newtype Style = Style { runStyle :: StyleM () }
class Term a where
term :: String -> a
instance Term String where
term = id
instance Term (String -> StyleM ()) where
term property value = tell [(property, value)]
display :: String -> StyleM ()
display = term "display"
flex :: Term a => a
flex = term "flex"
someStyle :: Style
someStyle = Style $ do
flex "1" -- [1] :: StyleM ()
display flex -- [2]
And the error:
Style.hs:29:5: error:
• Ambiguous type variable ‘a0’ arising from a use of ‘flex’
prevents the constraint ‘(Term
([Char]
-> WriterT
[(String, String)]
Data.Functor.Identity.Identity
a0))’ from being solved.
(maybe you haven't applied a function to enough arguments?)
Probable fix: use a type annotation to specify what ‘a0’ should be.
These potential instance exist:
one instance involving out-of-scope types
instance Term (String -> StyleM ()) -- Defined at Style.hs:17:10
• In a stmt of a 'do' block: flex "1"
In the second argument of ‘($)’, namely
‘do { flex "1";
display flex }’
In the expression:
Style
$ do { flex "1";
display flex }
Failed, modules loaded: none.
I've found two ways how to make this code compile, none of which I'm happy with.
Add explicit annotation where the flex function is used ([1]).
Move the line where flex is used to the end of the do block (eg. comment out [2]).
One difference between my API and Lucid is that the Lucid terms always take one argument, and Lucid uses fundeps, which presumably gives the GHC typechecker more information to work with (to choose the correct typeclass instance). But in my case the terms don't always have an argument (when they appear as the value).
The problem is that the Term instance for String -> StyleM () only exists when StyleM is parameterized with (). But in a do-block like
someStyle :: Style
someStyle = Style $ do
flex "1"
return ()
there is not enough information to know which is the type parameter in flex "1", because the return value is thrown away.
A common solution to this problem is the "constraint trick". It requires type equality constraints, so you have to enable {-# LANGUAGE TypeFamilies #-}
or {-# LANGUAGE GADTs #-} and tweak the instance like this:
{-# LANGUAGE TypeFamilies #-}
instance (a ~ ()) => Term (String -> StyleM a) where
term property value = tell [(property, value)]
This tells the compiler: "You don't need to know the precise type a to get the instance, there is one for all types! However, once the instance is determined, you'll always find that the type was () after all!"
This trick is the typeclass version of Henry Ford's "You can have any color you like, as long as it's black." The compiler can find an instance despite the ambiguity, and finding the instance gives him enough information to resolve the ambiguity.
It works because Haskell's instance resolution never backtracks, so once an instance "matches", the compiler has to commit to any equalities it discovers in the preconditions of the instance declaration, or throw a type error.
There is only one instance which fits (and indeed, in the type error GHC prints "These potential instance exist:" and then it lists just one). I'm confused that given the choice of a single instance, GHC refuses to use it.
Type classes are open; any module could define new instances. So GHC never assumes that it knows about all instances, when checking a use of a type class. (With the possible exception of the bad extensions like OverlappingInstances.) Logically, then, the only possible answers to a question "is there an instance for C T" are "yes" and "I don't know". To answer "no" risks incoherence with another part of your program that does define an instance C T.
So, you should not imagine the compiler iterating over every declared instance and seeing whether it fits at the particular use site of interest, because what would it do with all the "I don't know"s? Instead, the process works like this: infer the most general type that could be used at the particular use site and query the instance store for the needed instance. The query can return a more general instance than the one needed, but it can never return a more specific instance, since it would have to choose which more specific instance to return; then your program is ambiguous.
One way to think about the difference is that iterating over all declared instances for C would take linear time in the number of instances, while querying the instance store for a specific instance only has to examine a constant number of potential instances. For example, if I want to type check
Left True == Left False
I need an instance for Eq (Either Bool t), which can only be satisfied by one of
instance Eq (Either Bool t)
instance Eq (Either a t) -- *
instance Eq (f Bool t)
instance Eq (f a t)
instance Eq (g t)
instance Eq b
(The instance marked * is the one that actually exists, and in standard Haskell (without FlexibleInstances) it's the only one of these instances that is legal to declare; the traditional restriction to instances of the form C (T var1 ... varN) makes this step easy since there will always be exactly one potential instance.)
If instances are stored in something like a hash table then this query can be done in constant time regardless of the number of declared instances of Eq (which is probably a pretty large number).
In this step, only instance heads (the stuff to the right of the =>) are examined. Along with a "yes" answer, the instance store can return new constraints on type variables that come from the context of the instance (the stuff to the left of the =>). These constraints then need to be solved in the same manner. (This is why instances are considered to overlap if they have overlapping heads, even if their contexts look mutually exclusive, and why instance Foo a => Bar a is almost never a good idea.)
In your case, since a value of any type can be discarded in do notation, we need an instance for Term (String -> StyleM a). The instance Term (String -> StyleM ()) is more specific, so it's useless in this case. You could either write
do
() <- flex "1"
...
to make the needed instance more specific, or make the provided instance more general by using the type equality trick as explained in danidiaz's answer.

Why can't I use record selectors with an existentially quantified type?

When using Existential types, we have to use a pattern-matching syntax for extracting the foralled value. We can't use the ordinary record selectors as functions. GHC reports an error and suggest using pattern-matching with this definition of yALL:
{-# LANGUAGE ExistentialQuantification #-}
data ALL = forall a. Show a => ALL { theA :: a }
-- data ok
xALL :: ALL -> String
xALL (ALL a) = show a
-- pattern matching ok
-- ABOVE: heaven
-- BELOW: hell
yALL :: ALL -> String
yALL all = show $ theA all
-- record selector failed
forall.hs:11:19:
Cannot use record selector `theA' as a function due to escaped type variables
Probable fix: use pattern-matching syntax instead
In the second argument of `($)', namely `theA all'
In the expression: show $ theA all
In an equation for `yALL': yALL all = show $ theA all
Some of my data take more than 5 elements. It's hard to maintain the code if I
use pattern-matching:
func1 (BigData _ _ _ _ elemx _ _) = func2 elemx
Is there a good method to make code like that maintainable or to wrap it up so that I can use some kind of selectors?
Existential types work in a more elaborate manner than regular types. GHC is (rightly) forbidding you from using theA as a function. But imagine there was no such prohibition. What type would that function have? It would have to be something like this:
-- Not a real type signature!
theA :: ALL -> t -- for a fresh type t on each use of theA; t is an instance of Show
To put it very crudely, forall makes GHC "forget" the type of the constructor's arguments; all that the type system knows is that this type is an instance of Show. So when you try to extract the value of the constructor's argument, there is no way to recover the original type.
What GHC does, behind the scenes, is what the comment to the fake type signature above says—each time you pattern match against the ALL constructor, the variable bound to the constructor's value is assigned a unique type that's guaranteed to be different from every other type. Take for example this code:
case ALL "foo" of
ALL x -> show x
The variable x gets a unique type that is distinct from every other type in the program and cannot be matched with any type variable. These unique types are not allowed to escape to the top level—which is the reason why theA cannot be used as a function.
You can use record syntax in pattern matching,
func1 BigData{ someField = elemx } = func2 elemx
works and is much less typing for huge types.

in haskell, why do I need to specify type constraints, why can't the compiler figure them out?

Consider the function,
add a b = a + b
This works:
*Main> add 1 2
3
However, if I add a type signature specifying that I want to add things of the same type:
add :: a -> a -> a
add a b = a + b
I get an error:
test.hs:3:10:
Could not deduce (Num a) from the context ()
arising from a use of `+' at test.hs:3:10-14
Possible fix:
add (Num a) to the context of the type signature for `add'
In the expression: a + b
In the definition of `add': add a b = a + b
So GHC clearly can deduce that I need the Num type constraint, since it just told me:
add :: Num a => a -> a -> a
add a b = a + b
Works.
Why does GHC require me to add the type constraint? If I'm doing generic programming, why can't it just work for anything that knows how to use the + operator?
In C++ template programming, you can do this easily:
#include <string>
#include <cstdio>
using namespace std;
template<typename T>
T add(T a, T b) { return a + b; }
int main()
{
printf("%d, %f, %s\n",
add(1, 2),
add(1.0, 3.4),
add(string("foo"), string("bar")).c_str());
return 0;
}
The compiler figures out the types of the arguments to add and generates a version of the function for that type. There seems to be a fundamental difference in Haskell's approach, can you describe it, and discuss the trade-offs? It seems to me like it would be resolved if GHC simply filled in the type constraint for me, since it obviously decided it was needed. Still, why the type constraint at all? Why not just compile successfully as long as the function is only used in a valid context where the arguments are in Num?
If you don't want to specify the function's type, just leave it out and the compiler will infer the types automatically. But if you choose to specify the types, they have to be correct and accurate.
The entire point of types is to have a formal way to declare the right and wrong way to use a function. A type of (Num a) => a -> a -> a describes exactly what is required of the arguments. If you omitted the class constraint, you would have a more general function that could be used (erroneously) in more places.
And it’s not just preventing you from passing non-Num values to add. Everywhere the function goes, the type is sure to go. Consider this:
add :: a -> a -> a
add a b = a + b
foo :: [a -> a -> a]
foo = [add]
value :: [String]
value = [f "hello" "world" | f <- foo]
You want the compiler to reject this, right? How does it do that? By adding class constraints, and checking that they are not removed, even if you don’t directly name the function.
What’s different in the C++ version? There are no class constraints. The compiler substitutes int or std::string for T, then tries to compile the resulting code and looks for a matching + operator that it can use. The template system is “looser”, since it accepts more invalid programs, and this is a symptom of it being a separate stage before compilation. I would love to modify C++ to add the <? extends T> semantics from Java’s generics. Just learn the type system and recognize that parametric polymorphism is “stronger” than C++ templates, namely it will reject more invalid programs.
I think you might be being tripped up by the "crazy moon poetry" of GHC's error messages. It's not saying that it (being GHC) couldn't deduce the (Num a) constraint. It is saying that the (Num a) constraint can't be deduced from your type signature, which it knows must be there from the use of +. Hence, you're are stating that this function has a type more general than the compiler knows it can have. The compiler doesn't want you lying about your functions to the world!
In the first example you gave, without the type signature, if you run :t add in ghci, you'll see that the compiler knows full well that the (Num a) constraint is there.
As for C++'s templates, remember that they are syntactic templates and are only fully type checked in each instance as they are used. Your add template will work with any types so long as, at each place it is used, there is a suitable + operator and perhaps conversions, to make an instance of the template viable. No guarantees can be made about the template until then... which is why the body of the template must be "visible" to each module that uses it.
Basically, all C++ can do is validate the syntax of the template, and then keep it around as a kind of very hygienic macro. Whereas Haskell generates a real function for add (leaving aside that it may choose to also generate type specific specializations for optimization).
There are cases where the compiler can't figure out the right type for you and where it needs your help. Consider
f s = show $ read s
The compiler says:
Ambiguous type variable `a' in the constraints:
Read a' arising from a use of `read' at src\Main.hs:20:13-18
`Show a' arising from a use of `show' at src\Main.hs:20:6-9
Probable fix: add a type signature that fixes these type variable(s)
(Strange enough, it seems you can define this function in ghci, but it seems there is no way to actually use it)
If you want that something like f "1" works, you need to specify the type:
f s = show $ (read s :: Int)

Resources