Record syntax and sum types - haskell

I have this question about sum types in Haskell.
I'd like to create a sum type which is comprised of two or more other types, and each of the types may contain multiple fields. A trivial example would be like this:
data T3 = T1 { a :: Int, b :: Float} | T2 { x :: Char } deriving (Show)
In my understanding, T1 and T2 are data constructors which use record syntax. It seems that the definition of T3 will grow as the number of fields in T1 or T2 increases. My question is that how to practically handle these sum type constructors if the number of fields are large? Or, is it a good idea to mix sum type with record syntax?

I don't quite understand what concerns you have, but to answer the question in the last line: no, it is rather not a good idea to mix sum types with record syntax. Records in general remain a bit of a weak spot of the Haskell language; they don't handle scoping very well at all. It's usually fine as long as you just have some seperate types with different record labels, but as soon as sum types or name clashes come in it gets rather nasty.
In particular, Haskell permits you to use a record field accessor of the T1 constructor for any value of type T3 – print $ a (T2 'x') will compile without warnings, but give a rather hard to foresee error at runtime.
In your example, it fortunately looks like you can easily avoid that kind of trouble:
data T3 = T3_1 T1 | T3_2 T2
deriving (Show)
data T1 = T1 { a :: Int
, b :: Float}
deriving (Show)
data T2 = T2 { x :: Char }
deriving (Show)
Now, any deconstruction you could write will be properly typechecked to make sense.
And such a structure of meaningful, small specialised sub-types† is generally better to handle than a single monolithic type, especially if you have many functions that really deal only with part of the data structure.
The flip side is that it gets quadratically tedious to unwrap the layers of constructors, but that's fortunately a solved problem now: lens libraries allow you to compose accessor/modifiers very neatly.
Speaking of solved problems: Nikita Volkov has come up with a really nice concept for entirely replacing the problem-ridden record syntax.
†Um... actually these aren't subtypes in any proper sense of the word, but you get what I mean.

Related

Sum types - Why in Haskell is `show (Int | Double)` different than `(show Int) | (show Double)`

Why are these not equivalent?
show $ if someCondition then someInt else some double
and
if someCondition then show someInt else show someDouble
I understand that if you isolate the if ... else part in the first example to an expression by itself then you can't represent its type with an anonymous sum type, the kind of Int | Double, like something you could do easily in TypeScript (mentioning TypeScript because it is the langauge I used often and that supports Sum types), and would have to resort to using the Either data then based on it would call show.
The example I gave here is trivial but to me it makes more sense to think "Okay we are going to show something, and that something depends on someCondition" rather than "Okay if someCondition is true then show someInt otherwise show someDouble", and also allows for less code duplication (here the show is repeated twice but it could also be a long function application and instead of an if ... else there could be >2 branches to consider)
In my mind it should be easy for the compiler to check if each of the types that make the sum type (here Int | Double) could be used as a parameter to show function and decides if the types are correct or not. Even better is that show function always returns a string no matter the types of the parameters, so the compiler doesn't have to carry with it all the possible "branches" (so all the possible types).
Is it by choice that such a feature doesn't exist? Or is implementing it harder that I think?
All parts of an expression must be well-typed. The type of if someCondition then someInt else someDouble would have to be something like exists a. Show a => a, but Haskell doesn't support that kind of existential quantification.
Update: As chi points out in a comment, this would also be possible if Haskell had support for union/intersection types (which are not the same as sum/product types), but it unfortunately doesn't.
There are product types with lightweight syntax, written (,), in Haskell. One would thing that a sum type with a lightweight syntax, something like (Int | String), would be a great idea. The reality is more complicated. Let's see why (I'm taking some liberties with Num, they are not important).
if someCondition then 42 else "helloWorld"
If this should return a value of type like (Int | String), then what should the following return?
if someCondition then 42 else 0
(Int | Int) obviously, but if this is distinct from plain Int then we're in deep trouble. So (Int | Int) should be identical to plain Int.
One can immediately see that this is not just lightweight syntax for sum types, but a wholly new language feature. A different kind of type system if you will. Should we have one?
Let's look at this function.
mysteryType x a b = if x then a else b
Now what type does mysteryType have? Obviously
mysteryType :: Bool -> a -> b -> (a|b)
right? Now what if a and b are the same type?
let x = mysteryType True 42 0
This should be plain Int as we have agreed previously. Now mysteryType sometimes return an anonymous sum type, and sometimes it does not, depending on what arguments you pass. How would you pattern match such an expression? What on Earth can you do with it? Except trivial things like "show" (or whatever methods of other type-classes it would be an instance of), not a whole lot. Unless you add run-time type information to the language, that is, so typeof is available — and that make Haskell an entirely different language.
So yeah. Why isn't Haskell a TypeScript? Because we don't need another TypeScript. If you want TypeScript, you know where to find it.

understanding data structure in Haskell

I have a problem with a homework (the topic is : "functional data structures").
Please understand that I don't want anyone to solve my homework.
I just have a problem with understanding the structure of this :
data Heap e t = Heap {
empty :: t e,
insert :: e -> t e -> t e,
findMin :: t e -> Maybe e,
deleteMin :: t e -> Maybe (t e),
merge :: t e -> t e -> t e,
contains :: e -> t e -> Maybe Int
}
In my understanding "empty" "insert" and so on are functions which can applied to "Heap"-type data.
Now I just want to understand how that "Heap"thing looks like.
So I was typing things like :
a = Heap 42 42
But I get errors I can't really work with.
Maybe it is a dumb question and I'm just stuck at this point for no reason, but it is killing me at the moment.
Thankful to any help
If you truly wish to understand that type, you need to understand a few requisites first.
types and values (and functions)
Firstly, you need to understand what types and values are. I'm going to assume you understand this. You understand, for example, the separation between "hello" as a value and its type, String and you understand clearly what it means when I say a = "hello" :: String and:
a :: String
a = "hello"
If you don't understand that, then you need to research values and types in Haskell. There are a myriad of books that can help here, such as this one, which I helped to author: http://happylearnhaskelltutorial.com
I'm also going to assume you understand what functions and currying are, and how to use both of them.
polymorphic types
Secondly, as your example contains type variables, you'll need to understand what they are. That is, you need to understand what polymoprhic types are. So, for example, Maybe a, or Either a b, and you'll need to understand how Maybe String is different to Maybe Int and what Num a => [a] and even things like what Num a => [Maybe a] is.
Again, there are many free or paid books that can help, the example above covers this, too.
algebraic data types
Next up is algebraic data types. This is a pretty amazingly cool feature that Haskell has. Haskell-like languages such as Elm and Idris have it as well as others like Rust, too. It lets you define your own data types. These aren't just things like Structs in other languages, and yeah, they can even contain functions.
Maybe is actually an example of an algebraic data types. If you understand these, you'll know that:
data Direction = North | South | East | West
defines a data type called Direction whose values can only be one of North, South, East or West, and you'll know that you can also use the polymorhpic type variables above to parameterise your types like so:
data Tree a = EmptyNode | Node (Tree a) (Tree a)
which uses both optionality (as in the sum type of Direction above) as well as parameterization.
In addition to this, you can also have multiple types in each value. These are called product types, and Haskell's algebraic datatypes can be expressed as a combination of Sum types that can contain Product types. For example:
type Location = (Float, Float)
data ShapeNode = StringNode Location String | CircleNode Location Float | SquareNode Location Float Float
That is, each value can be one of StringNode, CircleNode or SquareNode, and in each case there are a different set of fields given to each value. To create a StringNode, for example, you'd need to pass the values of it constructor like this: StringNode (10.0, 5.3) "A String".
Again, the freely available books will go into much more detail about these things, but we're moving in the direction of getting more than a basic understanding of Haskell now.
Finally, in order to fully understand your example, you'll need to know about...
record types
Record types are the same as product types above, except that the fields are labelled rather than being anonymous. So, you could define the shape node data type like this, instead:
type Location = (Float, Float)
data ShapeNode
= StringNode { stringLocation :: Location, stringData :: String }
| CircleNode { circleLocation :: Location, radius :: Float }
| SquareNode { squareLocation :: Location, length :: Float, height :: Float }
Each field is named, and you can't repeat the same name inside data values.
All that you need in addition to this to understand the above example is to realise your example contains all of these things together, along with the fact that you have functions as your record field values in the data type you have.
It's a good idea to thoroughly flesh out your understanding and not skip any steps, then you'll be able to follow these kinds of things much more easily in the future. :) I wish you luck!
Heap is a record with six elements. In order to create a value of that type, you must supply all six elements. Assuming that you have appropriate values and functions, you can create a value like this:
myHeap = Heap myEmpty myInsert myFindMin myDeleteMin myMerge myContains
The doesn't seem like idiomatic Haskell design, however. Why not define generic functions independent of the data, or, if they must be bundled together, a typeclass?

Redundancy regarding product types and tuples in Haskell

In Haskell you have product types and you have tuples.
You use tuples if you don't want to associate a dedicated type with the value, and you can use product types if you wish to do so.
However I feel there is redundancy in the notation of product types
data Foo = Foo (String, Int, Char)
data Bar = Bar String Int Char
Why are there both kinds of notations? Is there any case where you would prefer one the other?
I guess you can't use record notation when using tuples, but that's just a convenience problem. Another thing might be the notion of order in tuples, as opposed to product types, but I think that's just due to the naming of the functions fst and snd.
#chi's answer is about the technical differences in terms of Haskell's evaluation model. I hope to give you some insight into the philosophy of this sort of typed programming.
In category theory we generally work with objects "up to isomorphism". Your Bar is of course isomorphic to (String, Int, Char), so from a categorical perspective they're the same thing.
bar_tuple :: Iso' Bar (String, Int, Char)
bar_tuple = iso to from
where to (Bar s i c) = (s, i, c)
from (s, i, c) = Bar s i c
In some sense tuples are a Platonic form of product type, in that they have no meaning beyond being a collection of disparate values. All the other product types can be mapped to and from a plain old tuple.
So why not use tuples everywhere, when all Haskell types ultimately boil down to a sum of products? It's about communication. As Martin Fowler says,
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
Names are important! Writing down a custom product type like
data Customer = Customer { name :: String, address :: String }
imbues the type Customer with meaning to the person reading the code, unlike (String, String) which just means "two strings".
Custom types are particularly useful when you want to enforce invariants by hiding the representation of your data and using smart constructors:
newtype NonEmpty a = NonEmpty [a]
nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty [] = Nothing
nonEmpty xs = Just (NonEmpty xs)
Now, if you don't export the NonEmpty constructor, you can force people to go through the nonEmpty smart constructor. If someone hands you a NonEmpty value you may safely assume that it has at least one element.
You can of course represent Customer as a tuple under the hood and expose evocatively-named field accessors,
newtype Customer = Bar (String, String)
name, address :: Customer -> String
name (Customer (n, a)) = n
address (Customer (n, a)) = a
but this doesn't really buy you much, except that it's now cheaper to convert Customer to a tuple (if, say, you're writing performance-sensitive code that works with a tuple-oriented API).
If your code is intended to solve a particular problem - which of course is the whole point of writing code - it pays to not just solve the problem, but make it look like you've solved it too. Someone - maybe you in a couple of years - is going to have to read this code and understand it with no a priori knowledge of how it works. Custom types are a very important communication tool in this regard.
The type
data Foo = Foo (String, Int, Char)
represents a double-lifted tuple. It values comprise
undefined
Foo undefined
Foo (undefined, undefined, undefined)
etc.
This is usually troublesome. Because of this, it's rare to see such definitions in actual code. We either have plain data types
data Foo = Foo String Int Char
or newtypes
newtype Foo = Foo (String, Int, Char)
The newtype can be just as inconvenient to use, but at least it
does not double-lift the tuple: undefined and Foo undefined are now equal values.
The newtype also provides zero-cost conversion between a plain tuple and Foo, in both directions.
You can see such newtypes in use e.g. when the programmer needs a different instance for some type class, than the one already associated with the tuple. Or, perhaps, it is used in a "smart constructor" idiom.
I would not expect the pattern used in Foo to be frequent. There is slight difference in what the constructor acts like: Foo :: (String, Int, Char) -> Foo as opposed to Bar :: String -> Int -> Char -> Bar. Then Foo undefined and Foo (undefined, ..., ...) are strictly speaking different things, whereas you miss one level of undefinedness in Bar.

What is the name for the contrary of Tuple or Either with more than two options?

There is a Tuple as a Product of any number of types and there is an Either as a Sum of two types. What is the name for a Sum of any number of types, something like this
data Thing a b c d ... = Thing1 a | Thing2 b | Thing3 c | Thing4 d | ...
Is there any standard implementation?
Before I make the suggestion against using such types, let me explain some background.
Either is a sum type, and a pair or 2-tuple is a product type. Sums and products can exist over arbitrarily many underlying types (sets). However, in Haskell, only tuples come in a variety of sizes out of the box. Either on the other hand, can to be (arbitrarily) nested to achieve that: Either Foo (Either Bar Baz).
Of course it's easy to instead define e.g. the types Either3 and Either4 etc, in the spirit of 3-tuples, 4-tuples and so on.
data Either3 a b c = Left a | Middle b | Right c
data Either4 a b c d = LeftMost a | Left b | Right c | RightMost d
...if you really want. Or you can find a library the does this, but I doubt you could call it "standard" by any standards...
However, if you do define your own generic sum and product types, they will be completely isomorphic to any type that is structurally equivalent, regardless of where it is defined. This means that you can, with relative ease, nicely adapt your code to interface with any other code that uses an alternative definition.
Furthermore, it is even very likely to be beneficial because that way you can give more meaningful, descriptive names to your sum and product types, instead of going with the generic tuple and either. In fact, some people advise for using custom types because it essentially adds static type safety. This also applies to non-sum/product types, e.g.:
employment :: Bool -- so which one is unemplyed and which one is employed?
data Empl = Employed | Unemployed
employment' :: Empl -- no ambiguity
or
person :: (Name, Age) -- yeah but when you see ("Erik", 29), is it just some random pair of name and age, or does it represent a person?
data Person = Person { name :: Name, age :: Age }
person' :: Person -- no ambiguity
— above, Person really encodes a product type, but with more meaning attached to it. You can also do newtype Person = Person (Name, Age), and it's actually quite equivalent anyway. So I always just prefer a nice and intention-revealing custom type. The same goes about Either and custom sum types.
So basically, Haskell gives you all the tools necessary to quickly build your own custom types with very clean and readable syntax, so it's best if we use it not resort to primitive types like tuples and either. However, it's nice to know about this isomorphism, for example in the context of generic programming. If you want to know more about that, you can google up "scrap your boilerplate" and "template your boilerplate" and just "(datatype) generic programming".
P.S. The reason they are called sum and product types respectively is that they correspond to set-union (sum) and set-product. Therefore, the number of values (or unique instances if you will) in the set that is described by the product type (a, b) is the product of the number of values in a and the number of values in b. For example (Bool, Bool) has exactly 2*2 values: (True, True), (False, False), (True, False), (False, True).
However Either Bool Bool has 2+2 values, Left True, Left False, Right True, Right False. So it happens to be the same number but that's obviously not the case in general.
But of course this can also be said about our custom Person product type, so again, there is little reason to use Either and tuples.
There are some predefined versions in HaXml package with OneOfN, TwoOfN, .. constructors.
In a generic context, this is usually done inductively, using Either or
data (:+:) f g a = L1 (f a) | R1 (g a)
The latter is defined in GHC.Generics to match the funny way it handles things.
In fact, the generic approach is to break every algebraic datatype down into (:+:) and
data (:*:) f g a = f a :*: f a
along with some extra stuff. That is, it turns everything into binary sums and binary products.
In a more concrete context, you're almost always better off using a custom algebraic datatype for things bigger than pairs or with more options than Either, as others have discussed. Slightly larger tuples (triples and maybe 4-tuples) can be useful for local one-off constructs, but it's hard to see how you'd use larger general sum types as one-offs.
Such a type is usually called a sum, variant, union, or tagged union type. Because the capability is a built-in feature of data types in Haskell, there's no name for it widely used in Haskell code. The Report only calls them "algebraic datatypes" (usually abbreviated to ADT), so that's the name you'll see most often in comments, but this name includes types with only one data constructor, which are only sum types in the trivial sense.

Why aren't there existentially quantified type variables in GHC Haskell

There are universally quantified type variables, and there are existentially quantified data types. However, despite that people give pseudocode of the form exists a. Int -> a to help explain concepts sometimes, it doesn't seem like a compiler extension that there's any real interest in. Is this just a "there isn't much value in adding this" kind of thing (because it does seem valuable to me), or is there a problem like undecidability that's makes it truly impossible.
EDIT:
I've marked viorior's answer as correct because it seems like it is probably the actual reason why this was not included. I'd like to add some additional commentary though just in case anyone would want to help clarify this more.
As requested in the comments, I'll give an example of why I would consider this useful. Suppose we have a data type as follows:
data Person a = Person
{ age: Int
, height: Double
, weight: Int
, name: a
}
So we choose parameterize over a, which is a naming convention (I know that it probably makes more sense in this example to make a NamingConvention ADT with appropriate data constructors for the American "first,middle,last", the hispanic "name,paternal name,maternal name", etc. But for now, just go with this).
So, there are several functions we see that basically ignore the type that Person is parameterized over. Examples would be
age :: Person a -> Int
height :: Person a -> Double
weight :: Person a -> Int
And any function built on top of these could similarly ignore the a type. For example:
atRiskForDiabetes :: Person a -> Bool
atRiskForDiabetes p = age p + weight p > 200
--Clearly, I am not actually a doctor
Now, if we have a heterogeneous list of people (of type [exists a. Person a]), we would like to be able to map some of our functions over the list. Of course, there are some useless ways to map:
heteroList :: [exists a. Person a]
heteroList = [Person 20 30.0 170 "Bob Jones", Person 50 32.0 140 3451115332]
extractedNames = map name heteroList
In this example, extractedNames is of course useless because it has type [exists a. a]. However, if we use our other functions:
totalWeight :: [exists a. Person a] -> Int
totalWeight = sum . map age
numberAtRisk :: [exists a. Person a] -> Int
numberAtRisk = length . filter id . map atRiskForDiabetes
Now, we have something useful that operates over a heterogeneous collection (And, we didn't even involve typeclasses). Notice that we were able to reuse our existing functions. Using an existential data type would go as follows:
data SomePerson = forall a. SomePerson (Person a) --fixed, thanks viorior
But now, how can we use age and atRiskForDiabetes? We can't. I think that you would have to do something like this:
someAge :: SomePerson -> Int
someAge (SomePerson p) = age p
Which is really lame because you have to rewrite all of your combinators for a new type. It gets even worse if you want to do this with a data type that's parameterized over several type variables. Imagine this:
somewhatHeteroPipeList :: forall a b. [exists c d. Pipe a b c d]
I won't explain this line of thought any further, but just notice that you'd be rewriting a lot of combinators to do anything like this using just existential data types.
That being said, I hope I've give a mildly convincing use that this could be useful. If it doesn't seem useful (or if the example seems too contrived), feel free to let me know. Also, since I am firstly a programmer and have no training in type theory, it's a little difficult for me to see how to use Skolem's theorum (as posted by viorior) here. If anyone could show me how to apply it to the Person a example I gave, I would be very grateful. Thanks.
It is unnecessary.
By Skolem's Theorem we could convert existential quantifier into universal quantifier with higher rank types:
(∃b. F(b)) -> Int <===> ∀b. (F(b) -> Int)
Every existentially quantified type of rank n+1 can be encoded as a universally quantified type of rank n
Existentially quantified types are available in GHC, so the question is predicated on a false assumption.

Resources