I am learning Haskell from learnyouahaskell.com. I am having trouble understanding type constructors and data constructors. For example, I don't really understand the difference between this:
data Car = Car { company :: String
, model :: String
, year :: Int
} deriving (Show)
and this:
data Car a b c = Car { company :: a
, model :: b
, year :: c
} deriving (Show)
I understand that the first is simply using one constructor (Car) to built data of type Car. I don't really understand the second one.
Also, how do data types defined like this:
data Color = Blue | Green | Red
fit into all of this?
From what I understand, the third example (Color) is a type which can be in three states: Blue, Green or Red. But that conflicts with how I understand the first two examples: is it that the type Car can only be in one state, Car, which can take various parameters to build? If so, how does the second example fit in?
Essentially, I am looking for an explanation that unifies the above three code examples/constructs.
In a data declaration, a type constructor is the thing on the left hand side of the equals sign. The data constructor(s) are the things on the right hand side of the equals sign. You use type constructors where a type is expected, and you use data constructors where a value is expected.
Data constructors
To make things simple, we can start with an example of a type that represents a colour.
data Colour = Red | Green | Blue
Here, we have three data constructors. Colour is a type, and Green is a constructor that contains a value of type Colour. Similarly, Red and Blue are both constructors that construct values of type Colour. We could imagine spicing it up though!
data Colour = RGB Int Int Int
We still have just the type Colour, but RGB is not a value – it's a function taking three Ints and returning a value! RGB has the type
RGB :: Int -> Int -> Int -> Colour
RGB is a data constructor that is a function taking some values as its arguments, and then uses those to construct a new value. If you have done any object-oriented programming, you should recognise this. In OOP, constructors also take some values as arguments and return a new value!
In this case, if we apply RGB to three values, we get a colour value!
Prelude> RGB 12 92 27
#0c5c1b
We have constructed a value of type Colour by applying the data constructor. A data constructor either contains a value like a variable would, or takes other values as its argument and creates a new value. If you have done previous programming, this concept shouldn't be very strange to you.
Intermission
If you'd want to construct a binary tree to store Strings, you could imagine doing something like
data SBTree = Leaf String
| Branch String SBTree SBTree
What we see here is a type SBTree that contains two data constructors. In other words, there are two functions (namely Leaf and Branch) that will construct values of the SBTree type. If you're not familiar with how binary trees work, just hang in there. You don't actually need to know how binary trees work, only that this one stores Strings in some way.
We also see that both data constructors take a String argument – this is the String they are going to store in the tree.
But! What if we also wanted to be able to store Bool, we'd have to create a new binary tree. It could look something like this:
data BBTree = Leaf Bool
| Branch Bool BBTree BBTree
Type constructors
Both SBTree and BBTree are type constructors. But there's a glaring problem. Do you see how similar they are? That's a sign that you really want a parameter somewhere.
So we can do this:
data BTree a = Leaf a
| Branch a (BTree a) (BTree a)
Now we introduce a type variable a as a parameter to the type constructor. In this declaration, BTree has become a function. It takes a type as its argument and it returns a new type.
It is important here to consider the difference between a concrete type (examples include Int, [Char] and Maybe Bool) which is a type that can be assigned to a value in your program, and a type constructor function which you need to feed a type to be able to be assigned to a value. A value can never be of type "list", because it needs to be a "list of something". In the same spirit, a value can never be of type "binary tree", because it needs to be a "binary tree storing something".
If we pass in, say, Bool as an argument to BTree, it returns the type BTree Bool, which is a binary tree that stores Bools. Replace every occurrence of the type variable a with the type Bool, and you can see for yourself how it's true.
If you want to, you can view BTree as a function with the kind
BTree :: * -> *
Kinds are somewhat like types – the * indicates a concrete type, so we say BTree is from a concrete type to a concrete type.
Wrapping up
Step back here a moment and take note of the similarities.
A data constructor is a "function" that takes 0 or more values and gives you back a new value.
A type constructor is a "function" that takes 0 or more types and gives you back a new type.
Data constructors with parameters are cool if we want slight variations in our values – we put those variations in parameters and let the guy who creates the value decide what arguments they are going to put in. In the same sense, type constructors with parameters are cool if we want slight variations in our types! We put those variations as parameters and let the guy who creates the type decide what arguments they are going to put in.
A case study
As the home stretch here, we can consider the Maybe a type. Its definition is
data Maybe a = Nothing
| Just a
Here, Maybe is a type constructor that returns a concrete type. Just is a data constructor that returns a value. Nothing is a data constructor that contains a value. If we look at the type of Just, we see that
Just :: a -> Maybe a
In other words, Just takes a value of type a and returns a value of type Maybe a. If we look at the kind of Maybe, we see that
Maybe :: * -> *
In other words, Maybe takes a concrete type and returns a concrete type.
Once again! The difference between a concrete type and a type constructor function. You cannot create a list of Maybes - if you try to execute
[] :: [Maybe]
you'll get an error. You can however create a list of Maybe Int, or Maybe a. That's because Maybe is a type constructor function, but a list needs to contain values of a concrete type. Maybe Int and Maybe a are concrete types (or if you want, calls to type constructor functions that return concrete types.)
Haskell has algebraic data types, which very few other languages have. This is perhaps what's confusing you.
In other languages, you can usually make a "record", "struct" or similar, which has a bunch of named fields that hold various different types of data. You can also sometimes make an "enumeration", which has a (small) set of fixed possible values (e.g., your Red, Green and Blue).
In Haskell, you can combine both of these at the same time. Weird, but true!
Why is it called "algebraic"? Well, the nerds talk about "sum types" and "product types". For example:
data Eg1 = One Int | Two String
An Eg1 value is basically either an integer or a string. So the set of all possible Eg1 values is the "sum" of the set of all possible integer values and all possible string values. Thus, nerds refer to Eg1 as a "sum type". On the other hand:
data Eg2 = Pair Int String
Every Eg2 value consists of both an integer and a string. So the set of all possible Eg2 values is the Cartesian product of the set of all integers and the set of all strings. The two sets are "multiplied" together, so this is a "product type".
Haskell's algebraic types are sum types of product types. You give a constructor multiple fields to make a product type, and you have multiple constructors to make a sum (of products).
As an example of why that might be useful, suppose you have something that outputs data as either XML or JSON, and it takes a configuration record - but obviously, the configuration settings for XML and for JSON are totally different. So you might do something like this:
data Config = XML_Config {...} | JSON_Config {...}
(With some suitable fields in there, obviously.) You can't do stuff like this in normal programming languages, which is why most people aren't used to it.
Start with the simplest case:
data Color = Blue | Green | Red
This defines a "type constructor" Color which takes no arguments - and it has three "data constructors", Blue, Green and Red. None of the data constructors takes any arguments. This means that there are three of type Color: Blue, Green and Red.
A data constructor is used when you need to create a value of some sort. Like:
myFavoriteColor :: Color
myFavoriteColor = Green
creates a value myFavoriteColor using the Green data constructor - and myFavoriteColor will be of type Color since that's the type of values produced by the data constructor.
A type constructor is used when you need to create a type of some sort. This is usually the case when writing signatures:
isFavoriteColor :: Color -> Bool
In this case, you are calling the Color type constructor (which takes no arguments).
Still with me?
Now, imagine you not only wanted to create red/green/blue values but you also wanted to specify an "intensity". Like, a value between 0 and 256. You could do that by adding an argument to each of the data constructors, so you end up with:
data Color = Blue Int | Green Int | Red Int
Now, each of the three data constructors takes an argument of type Int. The type constructor (Color) still doesn't take any arguments. So, my favorite color being a darkish green, I could write
myFavoriteColor :: Color
myFavoriteColor = Green 50
And again, it calls the Green data constructor and I get a value of type Color.
Imagine if you don't want to dictate how people express the intensity of a color. Some might want a numeric value like we just did. Others may be fine with just a boolean indicating "bright" or "not so bright". The solution to this is to not hardcode Int in the data constructors but rather use a type variable:
data Color a = Blue a | Green a | Red a
Now, our type constructor takes one argument (another type which we just call a!) and all of the data constructors will take one argument (a value!) of that type a. So you could have
myFavoriteColor :: Color Bool
myFavoriteColor = Green False
or
myFavoriteColor :: Color Int
myFavoriteColor = Green 50
Notice how we call the Color type constructor with an argument (another type) to get the "effective" type which will be returned by the data constructors. This touches the concept of kinds which you may want to read about over a cup of coffee or two.
Now we figured out what data constructors and type constructors are, and how data constructors can take other values as arguments and type constructors can take other types as arguments. HTH.
As others pointed out, polymorphism isn't that terrible useful here. Let's look at another example you're probably already familiar with:
Maybe a = Just a | Nothing
This type has two data constructors. Nothing is somewhat boring, it doesn't contain any useful data. On the other hand Just contains a value of a - whatever type a may have. Let's write a function which uses this type, e.g. getting the head of an Int list, if there is any (I hope you agree this is more useful than throwing an error):
maybeHead :: [Int] -> Maybe Int
maybeHead [] = Nothing
maybeHead (x:_) = Just x
> maybeHead [1,2,3] -- Just 1
> maybeHead [] -- None
So in this case a is an Int, but it would work as well for any other type. In fact you can make our function work for every type of list (even without changing the implementation):
maybeHead :: [t] -> Maybe t
maybeHead [] = Nothing
maybeHead (x:_) = Just x
On the other hand you can write functions which accept only a certain type of Maybe, e.g.
doubleMaybe :: Maybe Int -> Maybe Int
doubleMaybe Just x = Just (2*x)
doubleMaybe Nothing= Nothing
So long story short, with polymorphism you give your own type the flexibility to work with values of different other types.
In your example, you may decide at some point that String isn't sufficient to identify the company, but it needs to have its own type Company (which holds additional data like country, address, back accounts etc). Your first implementation of Car would need to change to use Company instead of String for its first value. Your second implementation is just fine, you use it as Car Company String Int and it would work as before (of course functions accessing company data need to be changed).
The second one has the notion of "polymorphism" in it.
The a b c can be of any type. For example, a can be a [String], b can be [Int]
and c can be [Char].
While the first one's type is fixed: company is a String, model is a String and year is Int.
The Car example might not show the significance of using polymorphism. But imagine your data is of the list type. A list can contain String, Char, Int ... In those situations, you will need the second way of defining your data.
As to the third way I don't think it needs to fit into the previous type. It's just one other way of defining data in Haskell.
This is my humble opinion as a beginner myself.
Btw: Make sure that you train your brain well and feel comfortable to this. It is the key to understand Monad later.
It's about types: In the first case, your set the types String (for company and model) and Int for year. In the second case, your are more generic. a, b, and c may be the very same types as in the first example, or something completely different. E.g., it may be useful to give the year as string instead of integer. And if you want, you may even use your Color type.
Related
Take a data type declaration like
data myType = Null | Container TypeA v
As I understand it, Haskell would read this as myType coming in two different flavors. One of them is Null which Haskell interprets just as some name of a ... I guess you'd call it an instance of the type? Or a subtype? Factor? Level? Anyway, if we changed Null to Nubb it would behave in basically the same way--Haskell doesn't really know anything about null values.
The other flavor is Container and I would expect Haskell to read this as saying that the Container flavor takes two fields, TypeA and v. I expect this is because, when making this type definition, the first word is always read as the name of the flavor and everything that follows is another field.
My question (besides: did I get any of that wrong?) is, how does Haskell know that TypeA is a specific named type rather than an un-typed variable? Am I wrong to assume that it reads v as an un-typed variable, and if that's right, is it because of the lower-case initial letter?
By un-typed I mean how the types appear in the following type-declaration for a function:
func :: a -> a
func a = a
First of all, terminology: "flavors" are called "cases" or "constructors". Your type has two cases - Null and Container.
Second, what you call "untyped" is not really "untyped". That's not the right way to think about it. The a in declaration func :: a -> a does not mean "untyped" the same way variables are "untyped" in JavaScript or Python (though even that is not really true), but rather "whoever calls this function chooses the type". So if I call func "abc", then I have chosen a to be String, and now the compiler knows that the result of this call must also be String, since that's what the func's signature says - "I take any type you choose, and I return the same type". The proper term for this is "generic".
The difference between "untyped" and "generic" is that "untyped" is free-for-all, the type will only be known at runtime, no guarantees whatsoever; whereas generic types, even though not precisely known yet, still have some sort of relationship between them. For example, your func says that it returns the same type it takes, and not something random. Or for another example:
mkList :: a -> [a]
mkList a = [a]
This function says "I take some type that you choose, and I will return a list of that same type - never a list of something else".
Finally, your myType declaration is actually illegal. In Haskell, concrete types have to be Capitalized, while values and type variables are javaCase. So first, you have to change the name of the type to satisfy this:
data MyType = Null | Container TypeA v
If you try to compile this now, you'll still get an error saying that "Type variable v is unknown". See, Haskell has decided that v must be a type variable, and not a concrete type, because it's lower case. That simple.
If you want to use a type variable, you have to declare it somewhere. In function declaration, type variables can just sort of "appear" out of nowhere, and the compiler will consider them "declared". But in a type declaration you have to declare your type variables explicitly, e.g.:
data MyType v = Null | Container TypeA v
This requirement exist to avoid confusion and ambiguity in cases where you have several type variables, or when type variables come from another context, such as a type class instance.
Declared this way, you'll have to specify something in place of v every time you use MyType, for example:
n :: MyType Int
n = Null
mkStringContainer :: TypeA -> String -> MyType String
mkStringContainer ta s = Container ta s
-- Or make the function generic
mkContainer :: TypeA -> a -> MyType a
mkContainer ta a = Container ta a
Haskell uses a critically important distinction between variables and constructors. Variables begin with a lower-case letter; constructors begin with an upper-case letter1.
So data myType = Null | Container TypeA v is actually incorrect; the first symbol after the data keyword is the name of the new type constructor you're introducing, so it must start with a capital letter.
Assuming you've fixed that to data MyType = Null | Container TypeA v, then each of the alternatives separated by | is required to consist of a data constructor name (here you've chosen Null and Container) followed by a type expression for each of the fields of that constructor.
The Null constructor has no fields. The Container constructor has two fields:
TypeA, which starts with a capital letter so it must be a type constructor; therefore the field is of that concrete type.
v, which starts with a lowercase letter and is therefore a type variable. Normally this variable would be defined as a type parameter on the MyType type being defined, like data MyType v = Null | Container TypeA v. You cannot normally use free variables, so this was another error in your original example.2
Your data declaration showed how the distinction between constructors and variables matters at the type level. This distinction between variables and constructors is also present at the value level. It's how the compiler can tell (when you're writing pattern matches) which terms are patterns it should be checking the data against, and which terms are variables that should be bound to whatever the data contains. For example:
lookAtMaybe :: Show a => Maybe a -> String
lookAtMaybe Nothing = "Nothing to see here"
lookAtMaybe (Just x) = "I found: " ++ show x
If Haskell didn't have the first-letter rule, then there would be two possible interpretations of the first clause of the function:
Nothing could be a reference to the externally-defined Nothing constructor, saying I want this function rule to apply when the argument matches that constructor. This is the interpretation the first-letter rule mandates.
Nothing could be a definition of an (unused) variable, representing the function's argument. This would be the equivalent of lookAtMaybe x = "Nothing to see here"
Both of those interpretations are valid Haskell code producing different behaviour (try changing the capital N to a lower case n and see what the function does). So Haskell needs a rule to choose between them. The designers chose the first-letter rule as a way of simply disambiguating constructors from variables (that is simple to both the compiler and to human readers) without requiring any additional syntactic noise.
1 The rule about the case of the first letter applies to alphanumeric names, which can only consist of letters, numbers, and underscores. Haskell also has symbolic names, which consists only of symbol characters like +, *, :, etc. For these, the rule is that names beginning with the : character are constructors, while names beginning with another character are variables. This is how the list constructor : is distinguished from a function name like +.
2 With the ExistentialQuantification extension turned on it is possible to write data MyType = Null | forall v. Container TypeA v, so that the the constructor has a field with a variable type and the variable does not appear as a parameter to the overall type. I'm not going to explain how this works here; it's generally considered an advanced feature, and isn't part of standard Haskell code (which is why it requires an extension)
Haskell enables one to construct algebraic data types using type constructors and data constructors. For example,
data Circle = Circle Float Float Float
and we are told this data constructor (Circle on right) is a function that constructs a circle when give data, e.g. x, y, radius.
Circle :: Float -> Float -> Float -> Circle
My questions are:
What is actually constructed by this function, specifically?
Can we define the constructor function?
I've seen Smart Constructors but they just seem to be extra functions that eventually call the regular constructors.
Coming from an OO background, constructors, of course, have imperative specifications. In Haskell, they seem to be system-defined.
In Haskell, without considering the underlying implementation, a data constructor creates a value, essentially by fiat. “ ‘Let there be a Circle’, said the programmer, and there was a Circle.” Asking what Circle 1 2 3 creates is akin to asking what the literal 1 creates in Python or Java.
A nullary constructor is closer to what you usually think of as a literal. The Boolean type is literally defined as
data Boolean = True | False
where True and False are data constructors, not literals defined by Haskell grammar.
The data type is also the definition of the constructor; as there isn't really anything to a value beyond the constructor name and its arguments, simply stating it is the definition. You create a value of type Circle by calling the data constructor Circle with 3 arguments, and that's it.
A so-called "smart constructor" is just a function that calls a data constructor, with perhaps some other logic to restrict which instances can be created. For example, consider a simple wrapper around Integer:
newtype PosInteger = PosInt Integer
The constructor is PosInt; a smart constructor might look like
mkPosInt :: Integer -> PosInteger
mkPosInt n | n > 0 = PosInt n
| otherwise = error "Argument must be positive"
With mkPosInt, there is no way to create a PosInteger value with a non-positive argument, because only positive arguments actually call the data constructor. A smart constructor makes the most sense when it, and not the data constructor, is exported by a module, so that a typical user cannot create arbitrary instances (because the data constructor does not exist outside the module).
Good question. As you know, given the definition:
data Foo = A | B Int
this defines a type with a (nullary) type constructor Foo and two data constructors, A and B.
Each of these data constructors, when fully applied (to no arguments in the case of A and to a single Int argument in the case of B) constructs a value of type Foo. So, when I write:
a :: Foo
a = A
b :: Foo
b = B 10
the names a and b are bound to two values of type Foo.
So, data constructors for type Foo construct values of type Foo.
What are values of type Foo? Well, first of all, they are different from values of any other type. Second, they are wholly defined by their data constructors. There is a distinct value of type Foo, different from all other values of Foo, for each combination of a data constructor with a set of distinct arguments passed to that data constructor. That is, two values of type Foo are identical if and only if they were constructed with the same data constructor given identical sets of arguments. ("Identical" here means something different from "equality", which may not necessarily be defined for a given type Foo, but let's not get into that.)
That's also what makes data constructors different from functions in Haskell. If I have a function:
bar :: Int -> Bool
It's possible that bar 1 and bar 2 might be exactly the same value. For example, if bar is defined by:
bar n = n > 0
then it's obvious that bar 1 and bar 2 (and bar 3) are identically True. Whether the value of bar is the same for different values of its arguments will depend on the function definition.
In contrast, if Bar is a constructor:
data BarType = Bar Int
then it's never going to be the case that Bar 1 and Bar 2 are the same value. By definition, they will be different values (of type BarType).
By the way, the idea that constructors are just a special kind of function is a common viewpoint. I personally think this is inaccurate and causes confusion. While it's true that constructors can often be used as if they are functions (specifically that they behave very much like functions when used in expressions), I don't think this view stands up under much scrutiny -- constructors are represented differently in the surface syntax of the language (with capitalized identifiers), can be used in contexts (like pattern matching) where functions cannot be used, are represented differently in compiled code, etc.
So, when you ask "can we define the constructor function", the answer is "no", because there is no constructor function. Instead, a constructor like A or B or Bar or Circle is what it is -- something different from a function (that sometimes behaves like a function with some special additional properties) which is capable of constructing a value of whatever type the data constructor belongs to.
This makes Haskell constructors very different from OO constructors, but that's not surprising since Haskell values are very different from OO objects. In an OO language, you can typically provide a constructor function that does some processing in building the object, so in Python you might write:
class Bar:
def __init__(self, n):
self.value = n > 0
and then after:
bar1 = Bar(1)
bar2 = Bar(2)
we have two distinct objects bar1 and bar2 (which would satify bar1 != bar2) that have been configured with the same field values and are in some sense "equal". This is sort of halfway between the situation above with bar 1 and bar 2 creating two identical values (namely True) and the situation with Bar 1 and Bar 2 creating two distinct values that, by definition, can't possibly be the "same" in any sense.
You can never have this situation with Haskell constructors. Instead of thinking of a Haskell constructor as running some underlying function to "construct" an object which might involve some cool processing and deriving of field values, you should instead think of a Haskell constructor as a passive tag attached to a value (which may also contain zero or more other values, depending on the arity of the constructor).
So, in your example, Circle 10 20 5 doesn't "construct" an object of type Circle by running some function. It directly creates a tagged object that, in memory, will look something like:
<Circle tag>
<Float value 10>
<Float value 20>
<Float value 5>
(or you can at least pretend that's what it looks like in memory).
The closest you can come to OO constructors in Haskell is using smart constructors. As you note, eventually a smart constructor just calls a regular constructor, because that's the only way to create a value of a given type. No matter what kind of bizarre smart constructor you build to create a Circle, the value it constructs will need to look like:
<Circle tag>
<some Float value>
<another Float value>
<a final Float value>
which you'll need to construct with a plain old Circle constructor call. There's nothing else the smart constructor could return that would still be a Circle. That's just how Haskell works.
Does that help?
I’m going to answer this in a somewhat roundabout way, with an example that I hope illustrates my point, which is that Haskell decouples several distinct ideas that are coupled in OOP under the concept of a “class”. Understanding this will help you translate your experience from OOP into Haskell with less difficulty. The example in OOP pseudocode:
class Person {
private int id;
private String name;
public Person(int id, String name) {
if (id == 0)
throw new InvalidIdException();
if (name == "")
throw new InvalidNameException();
this.name = name;
this.id = id;
}
public int getId() { return this.id; }
public String getName() { return this.name; }
public void setName(String name) { this.name = name; }
}
In Haskell:
module Person
( Person
, mkPerson
, getId
, getName
, setName
) where
data Person = Person
{ personId :: Int
, personName :: String
}
mkPerson :: Int -> String -> Either String Person
mkPerson id name
| id == 0 = Left "invalid id"
| name == "" = Left "invalid name"
| otherwise = Right (Person id name)
getId :: Person -> Int
getId = personId
getName :: Person -> String
getName = personName
setName :: String -> Person -> Either String Person
setName name person = mkPerson (personId person) name
Notice:
The Person class has been translated to a module which happens to export a data type by the same name—types (for domain representation and invariants) are decoupled from modules (for namespacing and code organisation).
The fields id and name, which are specified as private in the class definition, are translated to ordinary (public) fields on the data definition, since in Haskell they’re made private by omitting them from the export list of the Person module—definitions and visibility are decoupled.
The constructor has been translated into two parts: one (the Person data constructor) that simply initialises the fields, and another (mkPerson) that performs validation—allocation & initialisation and validation are decoupled. Since the Person type is exported, but its constructor is not, this is the only way for clients to construct a Person—it’s an “abstract data type”.
The public interface has been translated to functions that are exported by the Person module, and the setName function that previously mutated the Person object has become a function that returns a new instance of the Person data type that happens to share the old ID. The OOP code has a bug: it should include a check in setName for the name != "" invariant; the Haskell code can avoid this by using the mkPerson smart constructor to ensure that all Person values are valid by construction. So state transitions and validation are also decoupled—you only need to check invariants when constructing a value, because it can’t change thereafter.
So as for your actual questions:
What is actually constructed by this function, specifically?
A constructor of a data type allocates space for the tag and fields of a value, sets the tag to which constructor was used to create the value, and initialises the fields to the arguments of the constructor. You can’t override it because the process is completely mechanical and there’s no reason (in normal safe code) to do so. It’s an internal detail of the language and runtime.
Can we define the constructor function?
No—if you want to perform additional validation to enforce invariants, you should use a “smart constructor” function which calls the lower-level data constructor. Because Haskell values are immutable by default, values can be made correct by construction; that is, when you don’t have mutation, you don’t need to enforce that all state transitions are correct, only that all states themselves are constructed correctly. And often you can arrange your types so that smart constructors aren’t even necessary.
The only thing you can change about the generated data constructor “function” is making its type signature more restrictive using GADTs, to help enforce more invariants at compile-time. And as a side note, GADTs also let you do existential quantification, which lets you carry around encapsulated/type-erased information at runtime, exactly like an OOP vtable—so this is another thing that’s decoupled in Haskell but coupled in typical OOP languages.
Long story short (too late), you can do all the same things, you just arrange them differently, because Haskell provides the various features of OOP classes under separate orthogonal language features.
Sum type
The Maybe Int type is a sum type.
data Maybe Int = Nothing | Just Int
From my understanding, this is due to the fact that the Nothing value constructor takes no arguments and the second value constructor called Just takes only one argument. Therefore, because no value constructor takes more than one argument, this type is a product type.
Product type
The type below is a product type since its data constructor takes two arguments and therefore is a product type.
data Colour = Person String Int
However, I am not sure how we would classify the following type in the context of the sum and product types. How should we refer to this?
data Shade = RGB Int Int Int | Transparent
All data types are "sums of products".
We sum over the number of constructors, and for each constructor we multiply over the number of arguments.
Sometimes the sum is trivial. When there is a single constructor, or none at all, we sum over a singleton or empty set. Summing over a single constructor makes the resulting type isomorphic to a product. Summing over no constructors makes the type to be empty (e.g. Data.Void.Void).
Sometimes, some of the products are trivial as well. When there is a single argument, or none at all, we multiply over a singleton or empty set. Multiplying over a single argument T simply produces T (after lifting). Multiplying over no arguments produces a type with only one value (e.g. ()).
Hence, sometimes our data is a non-trivial sum of trivial products, and we call it a "sum"; sometimes it is a trivial sum of non-trivial products, and we call it a "product". But, in the general case, it is always a "sum of products".
Note that algebraic types (up to isomorphism) form a commutative semiring, satisfying roughly the same laws of high school algebra for sums and products. In high school algebra, we can turn any expression involving nested sums and products into a polynomial, i.e. into a "sum of products". This also happens with types (up to isomorphism), hence the choice of making data types to be "sums of products" is rather expressive.
The Maybe Int type is a sum type because it has an alternation
data SumType = This | That
the fact that it has arguments on its constructors doesn't affect its "sum-ness." Sum types can also contain product constructors, such as:
type Username = String
type Email = String
-- User is a sum of three products
data User = NotLoggedIn -- nullary constructor
| Guest Username -- unary constructor
| RegisteredUser Username Email -- binary constructor
I'm a beginner in haskell and I wonder about the right way to define a new type. Suppose I want to define a Point type. In an imperative language, it's usually the equivalent of:
data Point = Int Int
However in haskell I usually see definitions such as:
data Point = Point Int Int
What are the differences and when should each approach be used?
In OO languages you can define a class with something like this
class Point {
int x,y;
Point(int x, int y) {...
}
it's similar
data Point = ...
is the type definition (similar to class Point above , and
... = Point Int Int
is the constructor, you can also define the constructor with a different name, but you need a name regardless.
data Point = P Int Int
The data definitions are, ultimately, tagged unions. For example:
data Maybe a = Nothing | Just a
Now how would you write this type using your syntax?
Moreover it remains the fact that in Haskell you can pattern match over this values and see which constructor was used to build a value. The name of the constructor is needed for pattern matching, and if the type has just one constructor it often re-uses the same name as the type.
For example:
let x = someOperationReturningMaybe
in case x of
Nothing -> 0
Just y -> y+5
This is different from plain union type, such as C's union where you can say "this thing is etiher an int or a float" but you have no way to know which one it actually is (except by keeping track of the state by hand).
Writing the code above using a C union you have no way to use a case to perform different actions depending on the constructor used, and you have to keep track explicitly what type is contained in that x and use an if.
I wrote the hsexif library and I would now add a feature, but I'm not sure how to prepare the API.
I have the ExifValue type. A ExifValue can be among others a ExifRational, which has a numerator and a denominator. Often you want to display that value (show) as "num/den", for instance for an exposition time of 1/160.
However sometimes you want to show it as a floating-point number, for instance for the exposure compensation, which you would display as "-0.75" for instance, or the aperture ("6.3").
So I want to add a function:
formatAsFloatingPoint :: ExifValue -> Int -> String
The function takes the exif value and the number of floating points after the comma to output in the result string, and returns the formatted string.
However the function will then accept any ExifValue and the user will get a runtime error and no compile time warning if it gives a ExifText as a parameter to that function...
How would I go to make a clean and type-safe API in this case?
You need to think about how you expect this to be used.
The caller might always know they have an ExifRational and will only call formatAsFloatingPoint with such a value. In that case it would make sense to refactor your datatype:
data Rational = Rational !Int !Int
data ExifValue = ... | ExifRational Rational | ...
(or perhaps reuse some existing type for expressing rationals)
and then make formatAsFloatingPoint take a Rational:
formatAsFloatingPoint :: Rational -> Int -> String
This moves the responsibility to the caller to decide when to call it.
Alternatively, perhaps callers just want to display an arbitrary ExifValue, but with special behaviour if the value happens to be an ExifRational. In that case, just use a catch-all case, e.g.:
formatAsFloatingPoint :: ExifValue -> Int -> String
formatAsFloatingPoint n (ExifRational num den) = ...
formatAsFloatingPoint _ v = show v
There are more complicated approaches based on using a type parameter to flag what kind of thing you have, but that would involve refactoring the entire library and there's little evidence that's warranted here. If you have a more general problem across the codebase of wanting to signal that you have specific kinds of ExifValue, they might make sense.