Is there a single word that means "non-recursive datatype with two constructors"? - haskell

Is there a word that describes data types that
have exactly two constructors; and
are not recursive?
i.e. describes these types
data Bool = False | True
data Maybe a = Nothing | Just a
data Either l r = Left l | Right r
but excludes these types
data Ordering = LT | EQ | GT -- too many constructors
data () = () -- too few constructors
data [a] = a | a : [a] -- recursive definition

I think the trait of having exactly two constructors is quite meaningless. Imagine the types:
data StrictOrdering = LT | GT
data Ordering' = EQ | NEQ !StrictOrdering
The type Ordering' is equivalent to the Ordering you mentioned, differing only in '2-constructorness'.
On the other hand, Maybe Bool, Either Bool Bool and Bool are very different and don't seem to deserve the same name except for being called 'sum types'.
Now, one may find some similarities between exists a. Maybe a and Bool, but to point them out one needs more constraints than just '2-constructorness'.

"Having two constructors" is a property that carries little information about what can be represented by such a type. It means forcing to weak-head-normal-form (WHNF) allows a binary choice in a case statement. Perhaps you could call it a "Two Headed Type" to coin a phrase.
It is more useful to GHC as a way to create an optimized representation in RAM for the data, since GHC uses pointer tagging which helps for types up to 4 constructors (or 8 on 64-bit machines).

How about nonrecursive two-constructor sum-type?

What about a binary sum/coproduct type (of two types)?

Related

User-defined tuple-based data constructors

Me trying to grok The Little MLer again. TLMLer has this SML code
datatype a pizza = Bottom | Topping of (a * (a pizza))
datatype fish = Anchovy | Lox | Tuna
which I've translated as
data PizzaSh a = CrustSh | ToppingSh a (PizzaSh a)
data FishPSh = AnchovyPSh | LoxPSh | TunaPSh
and then an alternative closer to TLMLer perhaps
data PizzaSh2 a = CrustSh2 | ToppingSh2 (a, PizzaSh2 a)
And from each I create a pizza
fpizza1 = ToppingSh AnchovyPSh (ToppingSh TunaPSh (ToppingSh LoxPSh CrustSh))
fpizza2 = ToppingSh2 (AnchovyPSh, ToppingSh2 (LoxPSh, ToppingSh2 (TunaPSh, CrustSh2)))
respectively, which are of type PizzaSh FishPSh and PizzaSh2 FishPSh respectively.
But the second version (which is arguably the closer to the original ML version) seems "offbeat." It's as if I'm creating a 2-tuple when I "cons" toppings together where the second member recursively expands . May I assume the parametric data constructor "function" of PizzaSh2 doesn't literally build a tuple, it's just borrowing the tuple as a cons strategy, correct? Which is preferable in Haskell, PizzaSh or PizzaSh2? As I understand, a tuple (cartesian product) data type will have a single constructor, e.g., data Point a b = Pt a b, not the disjoint union of ored-together (|) constructors. In SML the "*" indicates product, i.e., tuple, but, again, is this just a "tuple-like thing," i.e., it's just a tuple-looking way to cons a pizza together?
In Haskell, we prefer this style:
data PizzaSh a = CrustSh | ToppingSh a (PizzaSh a)
There is no need in Haskell to use a tuple there, since a data constructor like ToppingSh can take multiple arguments.
Using an additional pair as in
data PizzaSh2 a = CrustSh2 | ToppingSh2 (a, PizzaSh2 a)
creates a type which is almost isomorphic to the previous one, but is more cumbersome to handle since it requires to use more parentheses. E.g.
foo (ToppingSh x y)
-- vs
foo (ToppingSh2 (x, y))
bar :: PizzaSh a -> ...
bar (ToppingSh x y) = ....
-- vs
bar (ToppingSh2 (x, y)) = ...
Further, the type is indeed only almost isomorphic. When using an additional pair, because of laziness, we have one more value which can be represented in the type: we have a correpondence
ToppingSh x y <-> ToppingSh2 (x, y)
which breaks down in the case
??? <-> ToppingSh2 undefined
That is, ToppinggSh2 can be applied to a non-terminating (or otherwise exceptional), pair-valued expression, and that constructs a value which can not be represented using ToppingSh.
Operationally, to achieve that GHC uses a double indirection (roughly pointer-to-pointer, or thunk-returning-pair-of-thunks), which further slows down the code. Hence, it's also a bad choice from a performance point of view, if one cares about such micro-optimizations.
As far as the Haskell side, it absolutely is nesting a (,) constructor inside the ToppingSh constructor. It would violate Haskell's non-strict semantics to not do the nesting you requested. If the nesting was removed, you'd be unable to distinguish between undefined :: PizzaSh2 () and ToppingSh undefined :: PizzaSh2 (). And yes, most of the time, that isn't what you want. PizzaSh is the much more natural formulation in Haskell unless you have a particular need to be able to introduce another bottom into the evaluation process.
I can't address what's going on behind the scenes in any particular ML implementation. Though I can say that with strict evaluation semantics, there isn't a behavioral difference to observe, meaning compilers are free to use a wider variety of approaches.

Using different Ordering for Sets

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?
No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

Why do We Need Sum Types?

Imagine a language which doesn't allow multiple value constructors for a data type. Instead of writing
data Color = White | Black | Blue
we would have
data White = White
data Black = Black
data Blue = Black
type Color = White :|: Black :|: Blue
where :|: (here it's not | to avoid confusion with sum types) is a built-in type union operator. Pattern matching would work in the same way
show :: Color -> String
show White = "white"
show Black = "black"
show Blue = "blue"
As you can see, in contrast to coproducts it results in a flat structure so you don't have to deal with injections. And, unlike sum types, it allows to randomly combine types resulting in greater flexibility and granularity:
type ColorsStartingWithB = Black :|: Blue
I believe it wouldn't be a problem to construct recursive data types as well
data Nil = Nil
data Cons a = Cons a (List a)
type List a = Cons a :|: Nil
I know union types are present in TypeScript and probably other languages, but why did the Haskell committee chose ADTs over them?
Haskell's sum type is very similar to your :|:.
The difference between the two is that the Haskell sum type | is a tagged union, while your "sum type" :|: is untagged.
Tagged means every instance is unique - you can distunguish Int | Int from Int (actually, this holds for any a):
data EitherIntInt = Left Int | Right Int
In this case: Either Int Int carries more information than Int because there can be a Left and Right Int.
In your :|:, you cannot distinguish those two:
type EitherIntInt = Int :|: Int
How do you know if it was a left or right Int?
See the comments for an extended discussion of the section below.
Tagged unions have another advantage: The compiler can verify whether you as the programmer handled all cases, which is implementation-dependent for general untagged unions. Did you handle all cases in Int :|: Int? Either this is isomorphic to Int by definition or the compiler has to decide which Int (left or right) to choose, which is impossible if they are indistinguishable.
Consider another example:
type (Integral a, Num b) => IntegralOrNum a b = a :|: b -- untagged
data (Integral a, Num b) => IntegralOrNum a b = Either a b -- tagged
What is 5 :: IntegralOrNum Int Double in the untagged union? It is both an instance of Integral and Num, so we can't decide for sure and have to rely on implementation details. On the other hand, the tagged union knows exactly what 5 should be because it is branded with either Left or Right.
As for naming: The disjoint union in Haskell is a union type. ADTs are only a means of implementing these.
I will try to expand the categorical argument mentioned by #BenjaminHodgson.
Haskell can be seen as the category Hask, in which objects are types and morphisms are functions between types (disregarding bottom).
We can define a product in Hask as tuple - categorically speaking it meets the definition of the product:
A product of a and b is the type c equipped with projections p and q such that p :: c -> a and q :: c -> b and for any other candidate c' equipped with p' and q' there exists a morphism m :: c' -> c such that we can write p' as p . m and q' as q . m.
Read up on this in Bartosz' Category Theory for Programmers for further information.
Now for every category, there exists the opposite category, which has the same morphism but reverses all the arrows. The coproduct is thus:
The coproduct c of a and b is the type c equipped with injections i :: a -> c and j :: b -> c such that for all other candidates c' with i' and j' there exists a morphism m :: c -> c' such that i' = m . i and j' = m . j.
Let's see how the tagged and untagged union perform given this definition:
The untagged union of a and b is the type a :|: b such that:
i :: a -> a :|: b is defined as i a = a and
j :: b -> a :|: b is defined as j b = b
However, we know that a :|: a is isomorphic to a. Based on that observation we can define a second candidate for the product a :|: a :|: b which is equipped with the exact same morphisms. Therefore, there is no single best candidate, since the morphism m between a :|: a :|: b and a :|: b is id. id is a bijection, which implies that m is invertible and "convert" types either way. A visual representation of that argument. Replace p with i and q with j.
Restricting ourselves Either, as you can verify yourself with:
i = Left and
j = Right
This shows that the categorical complement of the product type is the disjoint union, not the set-based union.
The set union is part of the disjoint union, because we can define it as follows:
data Left a = Left a
data Right b = Right b
type DisjUnion a b = Left a :|: Right b
Because we have shown above that the set union is not a valid candidate for the coproduct of two types, we would lose many "free" properties (which follow from parametricity as leftroundabout mentioned) by not choosing the disjoint union in the category Hask (because there would be no coproduct).
This is an idea I've thought a lot about myself: a language with “first-class type algebra”. Pretty sure we could do about everything this way that we do in Haskell. Certainly if these disjunctions were, like Haskell alternatives, tagged unions; then you could directly rewrite any ADT to use them. In fact GHC can do this for you: if you derive a Generic instance, a variant type will be represented by a :+: construct, which is in essence just Either.
I'm not so sure if untagged unions would also do. As long as you require the types participating in a sum to be discernibly different, the explicit tagging should in principle not be necessary. The language would then need a convenient way to match on types at runtime. Sounds a lot like what dynamic languages do – obviously comes with quite some overhead though.
The biggest problem would be that if the types on both sides of :|: must be unequal then you lose parametricity, which is one of Haskell's nicest traits.
Given that you mention TypeScript, it is instructive to have a look at what its docs have to say about its union types. The example there starts from a function...
function padLeft(value: string, padding: any) { //etc.
... that has a flaw:
The problem with padLeft is that its padding parameter is typed as any. That means that we can call it with an argument that’s neither a number nor a string
One plausible solution is then suggested, and rejected:
In traditional object-oriented code, we might abstract over the two types by creating a hierarchy of types. While this is much more explicit, it’s also a little bit overkill.
Rather, the handbook suggests...
Instead of any, we can use a union type for the padding parameter:
function padLeft(value: string, padding: string | number) { // etc.
Crucially, the concept of union type is then described in this way:
A union type describes a value that can be one of several types.
A string | number value in TypeScript can be either of string type or of number type, as string and number are subtypes of string | number (cf. Alexis King's comment to the question). An Either String Int value in Haskell, however, is neither of String type nor of Int type -- its only, monomorphic, type is Either String Int. Further implications of that difference show up in the remainder of the discussion:
If we have a value that has a union type, we can only access members that are common to all types in the union.
In a roughly analogous Haskell scenario, if we have, say, an Either Double Int, we cannot apply (2*) directly on it, even though both Double and Int have instances of Num. Rather, something like bimap is necessary.
What happens when we need to know specifically whether we have a Fish? [...] we’ll need to use a type assertion:
let pet = getSmallPet();
if ((<Fish>pet).swim) {
(<Fish>pet).swim();
}
else {
(<Bird>pet).fly();
}
This sort of downcasting/runtime type checking is at odds with how the Haskell type system ordinarily works, even though it can be implemented using the very same type system (also cf. leftaroundabout's answer). In contrast, there is nothing to figure out at runtime about the type of an Either Fish Bird: the case analysis happens at value level, and there is no need to deal with anything failing and producing Nothing (or worse, null) due to runtime type mismatches.

What is the name for the contrary of Tuple or Either with more than two options?

There is a Tuple as a Product of any number of types and there is an Either as a Sum of two types. What is the name for a Sum of any number of types, something like this
data Thing a b c d ... = Thing1 a | Thing2 b | Thing3 c | Thing4 d | ...
Is there any standard implementation?
Before I make the suggestion against using such types, let me explain some background.
Either is a sum type, and a pair or 2-tuple is a product type. Sums and products can exist over arbitrarily many underlying types (sets). However, in Haskell, only tuples come in a variety of sizes out of the box. Either on the other hand, can to be (arbitrarily) nested to achieve that: Either Foo (Either Bar Baz).
Of course it's easy to instead define e.g. the types Either3 and Either4 etc, in the spirit of 3-tuples, 4-tuples and so on.
data Either3 a b c = Left a | Middle b | Right c
data Either4 a b c d = LeftMost a | Left b | Right c | RightMost d
...if you really want. Or you can find a library the does this, but I doubt you could call it "standard" by any standards...
However, if you do define your own generic sum and product types, they will be completely isomorphic to any type that is structurally equivalent, regardless of where it is defined. This means that you can, with relative ease, nicely adapt your code to interface with any other code that uses an alternative definition.
Furthermore, it is even very likely to be beneficial because that way you can give more meaningful, descriptive names to your sum and product types, instead of going with the generic tuple and either. In fact, some people advise for using custom types because it essentially adds static type safety. This also applies to non-sum/product types, e.g.:
employment :: Bool -- so which one is unemplyed and which one is employed?
data Empl = Employed | Unemployed
employment' :: Empl -- no ambiguity
or
person :: (Name, Age) -- yeah but when you see ("Erik", 29), is it just some random pair of name and age, or does it represent a person?
data Person = Person { name :: Name, age :: Age }
person' :: Person -- no ambiguity
— above, Person really encodes a product type, but with more meaning attached to it. You can also do newtype Person = Person (Name, Age), and it's actually quite equivalent anyway. So I always just prefer a nice and intention-revealing custom type. The same goes about Either and custom sum types.
So basically, Haskell gives you all the tools necessary to quickly build your own custom types with very clean and readable syntax, so it's best if we use it not resort to primitive types like tuples and either. However, it's nice to know about this isomorphism, for example in the context of generic programming. If you want to know more about that, you can google up "scrap your boilerplate" and "template your boilerplate" and just "(datatype) generic programming".
P.S. The reason they are called sum and product types respectively is that they correspond to set-union (sum) and set-product. Therefore, the number of values (or unique instances if you will) in the set that is described by the product type (a, b) is the product of the number of values in a and the number of values in b. For example (Bool, Bool) has exactly 2*2 values: (True, True), (False, False), (True, False), (False, True).
However Either Bool Bool has 2+2 values, Left True, Left False, Right True, Right False. So it happens to be the same number but that's obviously not the case in general.
But of course this can also be said about our custom Person product type, so again, there is little reason to use Either and tuples.
There are some predefined versions in HaXml package with OneOfN, TwoOfN, .. constructors.
In a generic context, this is usually done inductively, using Either or
data (:+:) f g a = L1 (f a) | R1 (g a)
The latter is defined in GHC.Generics to match the funny way it handles things.
In fact, the generic approach is to break every algebraic datatype down into (:+:) and
data (:*:) f g a = f a :*: f a
along with some extra stuff. That is, it turns everything into binary sums and binary products.
In a more concrete context, you're almost always better off using a custom algebraic datatype for things bigger than pairs or with more options than Either, as others have discussed. Slightly larger tuples (triples and maybe 4-tuples) can be useful for local one-off constructs, but it's hard to see how you'd use larger general sum types as one-offs.
Such a type is usually called a sum, variant, union, or tagged union type. Because the capability is a built-in feature of data types in Haskell, there's no name for it widely used in Haskell code. The Report only calls them "algebraic datatypes" (usually abbreviated to ADT), so that's the name you'll see most often in comments, but this name includes types with only one data constructor, which are only sum types in the trivial sense.

Ordering of Bool types (i.e. True > False) - Why? [duplicate]

This question already has answers here:
Understanding Haskell's Bool Deriving an Ord
(3 answers)
Closed 7 years ago.
Can someone please explain the following output?
Prelude> compare True False
GT
it :: Ordering
Prelude> compare False True
LT
it :: Ordering
Why are Bool type values ordered in Haskell - especially, since we can demonstrate that values of True and False are not exactly 1 and 0 (unlike many other languages)?
This is how the derived instance of Ord works:
data D = A | B | C deriving Ord
Given that datatype, we get C > B > A. Bool is defined as False | True, and it kind of makes sense when you look at other examples such as:
Maybe a = Nothing | Just a
Either a b = Left a | Right b
In each of the case having "some" ("truthy") value is greater than having no values at all (or having "left" or "bad" or "falsy" value).
While Bool is not Int, it can be converted to the 0,1 fragment of Int since it is an Enum type.
fromEnum False = 0
fromEnum True = 1
Now, the Enum could have been different, reversing 0 and 1, but that would probably be surprising to most programmers thinking about bits.
Since it has an Enum type, everything else being equal, it's better to define an Ord instance which follows the same order, satisfying
compare x y = compare (fromEnum x) (fromEnum y)
In fact, each instance generated from deriving (Eq, Ord, Enum) follows such property.
On a more theoretical note, logicians tend to order propositions from the strongest to the weakest (forming a lattice). In this structure, False (as a proposition) is the bottom, i.e. the least element, while True is the top. While this is only a convention (theory would be just as nice if we picked the opposite ordering), it's a good thing to be consistent.
Minor downside: the implication boolean connective is actually p <= q expressing that p implies q, instead of the converse as the "arrow" seems to indicate.
Let me answer your question with a question: Why is there an Ord instance for ()?
Unlike Bool, () has only one possible value: (). So why the hell would you ever want to compare it? There is only one value possible!
Basically, it's useful if all or most of the standard basic types have instances for common classes. It makes it easier to derive instances for your own types. If Foo doesn't have an Ord instance, and your new type has a single Foo field, then you can't auto-derive an Ord instance.
You might, for example, have some kind of tree type where we can attach several items of information to the leaves. Something like Tree x y z. And you might want to have an Eq instance to compare trees. It would be annoying if Tree () Int String didn't have an Eq instance just because () doesn't. So that's why () has Eq (and Ord and a few others).
Similar remarks apply to Bool. It might not sound particularly useful to compare two bool values, but it would be irritating if your Ord instance vanishes as soon as you put a bool in there.
(One other complicating factor is that sometimes we want Ord because there's a logically meaningful ordering for things, and sometimes we just want some arbitrary order, typically so we can use something as a key for Data.Map or similar. Arguably there ought to be two separate classes for that… but there isn't.)
Basically, it comes from math. In set theory or category theory boolean functions are usually thought of as classifiers of subsets/subobjects. In plain terms, function f :: a -> Bool is identified with filter f :: [a] -> [a]. So, if we change one value from False to True, the resulting filtered list (subset, subobject, whatever) is going to have more elements. Therefore, True is considered "bigger" than False.

Resources