Cannot parse data constructor in a data/newtype declaration

Cannot parse data constructor in a data/newtype declaration - haskell

I have the type Card, consists of suit and rank,
data Suit = A|B deriving (Show, Eq)
data Rank = 1|2 deriving (Show, Eq)
data Card = Card Suit Rank deriving (Show, Eq)
It seems be wrong about the data Rank function, since Int cannot be the type constructor and how to create a right function if my cards are A1|B1|A2|B2
Thank you

It might look as if the statement:
data Suit = A | B
is only defining one thing, the type Suit, as a collection/set of arbitrary objects. Actually, though, it's defining three things: the type Suit and two constructors A and B for creating values of that type.
If the definition:
data Rank = 1 | 2
actually worked, it wouldn't be defining Rank as a collection of the numbers 1 and 2, it would be redefining the numbers 1 and 2 as constructors/values of the new type Rank, and you'd no longer be able to use them as regular numbers. (For example, the expression n + 1 would now be a type error, because (+) expects a number, and 1 would have been redefined as a Rank).
Fortunately or unfortunately, Haskell won't accept numbers as constructor names -- they need to be valid identifiers starting with uppercase letters (or operators that start with a colon).
So, there are two usual approaches to defining a type like Rank that's meant to represent some subset of numbers. The first, as noted in the comments, is to define it much like you already have, but change your numbers into valid identifiers by prefixing with an uppercase letter:
data Rank = R1 | R2
The advantage of this is that it guarantees that only valid ranks can be represented. Here, only ranks 1 and 2 are allowed. If someone tried to write R3 somewhere, it wouldn't work, because that constructor hasn't been defined. The big disadvantage is that this quickly becomes unruly. If these were playing cards, the definition would be:
data Rank = R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 | R11 | R12 | R13
and a function to, say, assign point values to cards for rummy would look like:
points :: Rank -> Int
points R1 = 10 -- Ace worth 10
points R2 = 2
points R3 = 3
...
points R9 = 9
points R10 = 10 -- 10 and face cards all worth 10
points R11 = 10
points R12 = 10
points R13 = 10
(In real code, you'd use more advanced Haskell features like a derived Enum instance to deal with this.)
The second approach is to define your rank in terms of an existing numeric type:
data Rank = Rank Int -- actually, `data` would probably be `newtype`
This defines two things, a type named Rank, and a constructor, also named Rank. (This is okay, as types and constructors live in different namespaces.)
In this definition, instead of Rank being defined as a discrete set of values given by explicit constructors with one constructor per value, this definition essential makes the type Rank an Int that's "tagged" with the constructor Rank.
The disadvantage of this approach is that it's now possible to create invalid ranks (since someone can write Rank 14). The advantage is that it's often easier to work with. For example, you can extract the integer from the rank, so points can be defined as:
points :: Rank -> Int
points (Rank 1) = 10 -- Ace is worth 10
points (Rank r) | r >= 10 = 10 -- 10 and face are worth 10
| otherwise = r -- rest are worth their rank
Note that, with this set of definitions:
data Suit = A | B deriving (Show, Eq)
newtype Rank = Rank Int deriving (Show, Eq)
data Card = Card Suit Rank deriving (Show, Eq)
you'd construct Card value using an expression like Card A (Rank 1) for your "A1" card.
There's actually a third approach. Some people might skip defining the Rank type entirely and either write:
data Suit = A | B deriving (Show, Eq)
data Card = Card Suit Int deriving (Show, Eq)
or write the equivalent code using a type alias:
data Suit = A | B deriving (Show, Eq)
type Rank = Int
data Card = Card Suit Rank deriving (Show, Eq)
Note that the type alias here is really just for documentation. Here, Rank and Int are exactly the same type and can be used interchangeably. Using Rank just makes the code easier to understand by making it clear where the programmer intended an Int to stand for a Rank versus an integer used for some other purpose.
The main advantage of this approach is that you can avoid including the word Rank in lots of places (e.g., cards are written Card A 1 instead of Card A (Rank 1), and the definition of points wouldn't need to pattern match the argument on Rank r, etc.) The main disadvantage is that it blurs the distinction between Rank and other integers and makes it easier to make programming errors like using the Rank where you meant to use the points and vice versa.

Related

Defining types in Haskell in terms of other types and typeclasses struggling

Can someone please explain to me how we can define a new type in terms of itself in Haskell. Below is the snippet of code I am struggling to understand. I do not understand how we can define a new type in terms of itself. We are saying Card is a new type with constructors Card Rank Suit. What does this even mean?
Any help would be greatly appreciated.
data Suit = Spades | Hearts | Clubs | Diamonds
deriving (Show, Eq)
data Rank = Numeric Int | Jack | Queen | King | Ace
deriving Show
data Card = Card Rank Suit
deriving Show

We are saying Card is a new type with constructors Card Rank Suit. What does this even mean?
Well, first of all it's wrong. Card is a new type with the single constructor Card; the latter takes two arguments, which are of type Rank and Suit.
The confusing thing here is that you're really dealing with two different things that are both called Card. Let's disambiguate:
data CardT = CardC Rank Suit
deriving Show
That expresses the same definition, but now it's clear that the type and constructor aren't the same thing. CardT lives in the type-level language, CardC lives in the value-level language.
> :t CardC
CardC :: Rank -> Suit -> CardT

How to write the instance for Show of a given datatype shorter?

I´m quite new to Haskell but I wonder how I can write following Code shorter:
data Suite = Club | Heart | Spade | Diamond
data Value = Two | Three | Four | Five | Six | Seven | Eight | Nine | Ten | Jack | Queen |
King | Ace
data Card = Card Suite Value
instance Show Suite where
show Club = "Club"
show Heart = "Heart"
show Spade = "Spade"
show Diamond = "Diamond"
instance Enum Suite where
enumFromTo Club Diamond = [Club, Heart, Spade, Diamond]
enumFromTo Heart Diamond = [Heart, Spade, Diamond]
enumFromTo Club Spade = [Club, Heart, Spade]
instance Show Value where
show Two = "Two"
show Three = "Three"
show Four = "Four"
show Five = "Five"
show Six = "Six"
show Seven = "Seven"
show Eight = "Eight"
show Nine = "Nine"
show Ten = "Ten"
show Jack = "Jack"
show Queen = "Queen"
show King = "King"
show Ace = "Ace"
I want to write the instance for Show Value way shorter. Is there a good way to do this or do I need to write all of it?
I also wonder how i could go from here if I want to define the same instances for Eq Card, Ord Card?
So far
instance Eq Card where
Card _ _ == _ = False
instance Ord Card where
Card Heart Three > Card Heart Two = True
worked, but to write every single possibility would be quite a lot of work.
Thanks for any answers!
Edit: I´m aware of the possiblity to append deriving (Show, etc..) but I don´t want to use it

You've rejected deriving these instances, which is the main way we avoid that much boilerplate. The most obvious remaining elementary way to shorten the Show Value is to use a case expression. This uses an extra line but shortens each case slightly:
instance Show Value where
show x = case x of
Two -> "Two"
Three -> "Three"
-- etc.
Expanding to non-elementary ways, you could
Use generics, either the somewhat more modern version in GHC.Generics or (probably easier in this case) the one in Data.Data. For these, you'll need deriving Generic or deriving Data, respectively, and then you can write (or dig up on Hackage) generic versions of the class methods you need. Neither of these approaches seems very appropriate for a Haskell beginner, but you can work up to them over a number of months.
Use Template Haskell. This is a very advanced language feature, and despite working with Haskell for many years, I have not really begun to grasp how to program with it. Good luck!

If you just want your show method call to print the name of the constructor (as it appears here), there's no need to manually instance them at all. You can automatically derive the Show instance thusly:
data Suit = Club | Heart | Spade | Diamond
deriving Show
data Value = Two | Three | Four | Five | Six | Seven
| Eight | Nine | Ten | Jack | Queen | King | Ace
deriving Show

In some cases, such as Instance Show Value, there is no good way to shorten it without deriving (not counting the ones in dfeuer's answer).
But in others there is! E.g.
for the Enum instances it's enough to define fromEnum and toEnum, all the rest have default definitions. You certainly don't need to list all possibilities in enumFromTo as your example code does.
After you define instance Enum Value, you can write comparison functions by converting to Int and comparing the results:
instance Eq Value where
x == y = fromEnum x == fromEnum y
instance Ord Value where
compare x y = compare (fromEnum x) (fromEnum y)
You can use instances for Value and Suit when writing definitions for Card, e.g.
instance Eq Card where
Card s1 v1 == Card s2 v2 = s1 == s2 && v1 == v2

Why do We Need Sum Types?

Imagine a language which doesn't allow multiple value constructors for a data type. Instead of writing
data Color = White | Black | Blue
we would have
data White = White
data Black = Black
data Blue = Black
type Color = White :|: Black :|: Blue
where :|: (here it's not | to avoid confusion with sum types) is a built-in type union operator. Pattern matching would work in the same way
show :: Color -> String
show White = "white"
show Black = "black"
show Blue = "blue"
As you can see, in contrast to coproducts it results in a flat structure so you don't have to deal with injections. And, unlike sum types, it allows to randomly combine types resulting in greater flexibility and granularity:
type ColorsStartingWithB = Black :|: Blue
I believe it wouldn't be a problem to construct recursive data types as well
data Nil = Nil
data Cons a = Cons a (List a)
type List a = Cons a :|: Nil
I know union types are present in TypeScript and probably other languages, but why did the Haskell committee chose ADTs over them?

Haskell's sum type is very similar to your :|:.
The difference between the two is that the Haskell sum type | is a tagged union, while your "sum type" :|: is untagged.
Tagged means every instance is unique - you can distunguish Int | Int from Int (actually, this holds for any a):
data EitherIntInt = Left Int | Right Int
In this case: Either Int Int carries more information than Int because there can be a Left and Right Int.
In your :|:, you cannot distinguish those two:
type EitherIntInt = Int :|: Int
How do you know if it was a left or right Int?
See the comments for an extended discussion of the section below.
Tagged unions have another advantage: The compiler can verify whether you as the programmer handled all cases, which is implementation-dependent for general untagged unions. Did you handle all cases in Int :|: Int? Either this is isomorphic to Int by definition or the compiler has to decide which Int (left or right) to choose, which is impossible if they are indistinguishable.
Consider another example:
type (Integral a, Num b) => IntegralOrNum a b = a :|: b -- untagged
data (Integral a, Num b) => IntegralOrNum a b = Either a b -- tagged
What is 5 :: IntegralOrNum Int Double in the untagged union? It is both an instance of Integral and Num, so we can't decide for sure and have to rely on implementation details. On the other hand, the tagged union knows exactly what 5 should be because it is branded with either Left or Right.
As for naming: The disjoint union in Haskell is a union type. ADTs are only a means of implementing these.

I will try to expand the categorical argument mentioned by #BenjaminHodgson.
Haskell can be seen as the category Hask, in which objects are types and morphisms are functions between types (disregarding bottom).
We can define a product in Hask as tuple - categorically speaking it meets the definition of the product:
A product of a and b is the type c equipped with projections p and q such that p :: c -> a and q :: c -> b and for any other candidate c' equipped with p' and q' there exists a morphism m :: c' -> c such that we can write p' as p . m and q' as q . m.
Read up on this in Bartosz' Category Theory for Programmers for further information.
Now for every category, there exists the opposite category, which has the same morphism but reverses all the arrows. The coproduct is thus:
The coproduct c of a and b is the type c equipped with injections i :: a -> c and j :: b -> c such that for all other candidates c' with i' and j' there exists a morphism m :: c -> c' such that i' = m . i and j' = m . j.
Let's see how the tagged and untagged union perform given this definition:
The untagged union of a and b is the type a :|: b such that:
i :: a -> a :|: b is defined as i a = a and
j :: b -> a :|: b is defined as j b = b
However, we know that a :|: a is isomorphic to a. Based on that observation we can define a second candidate for the product a :|: a :|: b which is equipped with the exact same morphisms. Therefore, there is no single best candidate, since the morphism m between a :|: a :|: b and a :|: b is id. id is a bijection, which implies that m is invertible and "convert" types either way. A visual representation of that argument. Replace p with i and q with j.
Restricting ourselves Either, as you can verify yourself with:
i = Left and
j = Right
This shows that the categorical complement of the product type is the disjoint union, not the set-based union.
The set union is part of the disjoint union, because we can define it as follows:
data Left a = Left a
data Right b = Right b
type DisjUnion a b = Left a :|: Right b
Because we have shown above that the set union is not a valid candidate for the coproduct of two types, we would lose many "free" properties (which follow from parametricity as leftroundabout mentioned) by not choosing the disjoint union in the category Hask (because there would be no coproduct).

This is an idea I've thought a lot about myself: a language with “first-class type algebra”. Pretty sure we could do about everything this way that we do in Haskell. Certainly if these disjunctions were, like Haskell alternatives, tagged unions; then you could directly rewrite any ADT to use them. In fact GHC can do this for you: if you derive a Generic instance, a variant type will be represented by a :+: construct, which is in essence just Either.
I'm not so sure if untagged unions would also do. As long as you require the types participating in a sum to be discernibly different, the explicit tagging should in principle not be necessary. The language would then need a convenient way to match on types at runtime. Sounds a lot like what dynamic languages do – obviously comes with quite some overhead though.
The biggest problem would be that if the types on both sides of :|: must be unequal then you lose parametricity, which is one of Haskell's nicest traits.

Given that you mention TypeScript, it is instructive to have a look at what its docs have to say about its union types. The example there starts from a function...
function padLeft(value: string, padding: any) { //etc.
... that has a flaw:
The problem with padLeft is that its padding parameter is typed as any. That means that we can call it with an argument that’s neither a number nor a string
One plausible solution is then suggested, and rejected:
In traditional object-oriented code, we might abstract over the two types by creating a hierarchy of types. While this is much more explicit, it’s also a little bit overkill.
Rather, the handbook suggests...
Instead of any, we can use a union type for the padding parameter:
function padLeft(value: string, padding: string | number) { // etc.
Crucially, the concept of union type is then described in this way:
A union type describes a value that can be one of several types.
A string | number value in TypeScript can be either of string type or of number type, as string and number are subtypes of string | number (cf. Alexis King's comment to the question). An Either String Int value in Haskell, however, is neither of String type nor of Int type -- its only, monomorphic, type is Either String Int. Further implications of that difference show up in the remainder of the discussion:
If we have a value that has a union type, we can only access members that are common to all types in the union.
In a roughly analogous Haskell scenario, if we have, say, an Either Double Int, we cannot apply (2*) directly on it, even though both Double and Int have instances of Num. Rather, something like bimap is necessary.
What happens when we need to know specifically whether we have a Fish? [...] we’ll need to use a type assertion:
let pet = getSmallPet();
if ((<Fish>pet).swim) {
(<Fish>pet).swim();
}
else {
(<Bird>pet).fly();
}
This sort of downcasting/runtime type checking is at odds with how the Haskell type system ordinarily works, even though it can be implemented using the very same type system (also cf. leftaroundabout's answer). In contrast, there is nothing to figure out at runtime about the type of an Either Fish Bird: the case analysis happens at value level, and there is no need to deal with anything failing and producing Nothing (or worse, null) due to runtime type mismatches.

Haskell way to go about enums

I want to represent a type of the following form :
(Card, Suit)
to represent cards in a card game where Card instances would be in the set:
{2, 3, 4, 5, 6, 7, 8, 9, J, Q, K, 1}
and Suit would have instances in the set:
{S, D, H, C}
I'd handle that with two Data declarations if that wasn't for the numbers:
data Suit = S | D | H | C deri...
but obviously adding numbers to those null arity types will fail.
So my question is, how to simulate the kind of enum you find in C?
I guess I'm misundestanding a basic point of the type system and help will be appreciated!
EDIT: I'll add some context: I want to represent the data contained in this Euler problem, as you can check, the data is represented in the form of 1S for an ace of spade, 2D for a 2 of diamond, etc...
What I'd really like is to be able to perform a read operation directly on the string to obtain the corresponding object.

I actually happen to have an implementation handy from when I was developing a poker bot. It's not particularly sophisticated, but it does work.
First, the relevant types. Ranks and suits are enumerations, while cards are the obvious compound type (with a custom Show instance)
import Text.ParserCombinators.Parsec
data Suit = Clubs | Diamonds | Hearts | Spades deriving (Eq,Ord,Enum,Show)
data Rank = Two | Three | Four | Five | Six | Seven | Eight | Nine | Ten
| Jack | Queen | King | Ace deriving (Eq,Ord,Enum,Show)
data Card = Card { rank :: Rank
, suit :: Suit } deriving (Eq,Ord,Bounded)
instance Show Card where
show (Card rank suit) = show rank ++ " of " ++ show suit
Then we have the parsing code, which uses Parsec. You could develop this to be much more sophisticated, to return better error messages, etc.
Note that, as Matvey said in the comments, the problem of parsing strings into their representations in the program is (or rather should be) orthogonal to how the enums are represented. Here I've cheated and broken the orthogonality: if you wanted to re-order the ranks (e.g. to have Ace rank below Two) then you would break the parsing code, because the parser depends on the internal representation of Two being 0, Three being 1 etc..
A better approach would be to spell out all of the ranks in parseRank explicitly (which is what I do in the original code). I wrote it like this to (a) save some space, (b) illustrate how it's possible in principle to parse a number into a rank, and (c) give you an example of bad practice explicitly spelled out, so you can avoid it in the future.
parseSuit :: Parser Suit
parseSuit = do s <- oneOf "SDCH"
return $ case s of
'S' -> Spades
'D' -> Diamonds
'H' -> Hearts
'C' -> Clubs
parseRank :: Parser Rank
parseRank = do r <- oneOf "23456789TJQKA"
return $ case r of
'T' -> Ten
'J' -> Jack
'Q' -> Queen
'K' -> King
'A' -> Ace
n -> toEnum (read [n] - 2)
parseCard :: Parser Card
parseCard = do r <- parseRank
s <- parseSuit
return $ Card { rank = r, suit = s }
readCard :: String -> Either ParseError Card
readCard str = parse parseCard "" str
And here it is in action:
*Cards> readCard "2C"
Right Two of Clubs
*Cards> readCard "JH"
Right Jack of Hearts
*Cards> readCard "AS"
Right Ace of Spades
Edit:
#yatima2975 mentioned in the comments that you might be able to have some fun playing with OverloadedStrings. I haven't been able to get it to do much that's useful, but it seems promising. First you need to enable the language option by putting {-# LANGUAGE OverloadedStrings #-} at the top of your file, and include the line import GHC.Exts ( IsString(..) ) to import the relevant typeclass. Then you can make a Card into a string literal:
instance IsString Card where
fromString str = case readCard str of Right c -> c
This allows you to pattern-match on the string representation of your card, rather than having to write out the types explicitly:
isAce :: Card -> Bool
isAce "AH" = True
isAce "AC" = True
isAce "AD" = True
isAce "AS" = True
isAce _ = False
You can also use the string literals as input to functions:
printAces = do
let cards = ["2H", "JH", "AH"]
mapM_ (\x -> putStrLn $ show x ++ ": " ++ show (isAce x)) cards
And here it is in action:
*Cards> printAces
Two of Hearts: False
Jack of Hearts: False
Ace of Hearts: True

data Card = Two | Three | Four | Five | Six
| Seven | Eight | Nine | Ten
| Jack | Queen | King | Ace
deriving Enum
Implementing the Enum typeclass means you can use fromEnum and toEnum to convert between Card and Int.
However, if it's important to you that fromEnum Two is 2, you will have to implement the Enum instance for Card by hand. (The autoderived instance starts at 0, just like C, but there's no way of overriding that without doing it all yourself.)
n.b. You might not need Enum --- if all you want is to use operators like < and == with your Cards, then you need to use deriving Ord.
Edit:
You cannot use read to turn a String of the form "2S" or "QH" into a (Card, Suit) because read will expect the string to look like "(a,b)" (e.g. "(2,S)" in the form you initially asked for, or "(Two,S)" in the form I suggested above).
You will have to write a function to parse the string yourself. You could use a parser (e.g. Parsec or Attoparsec), but in this case it should be simple enough to write by hand.
e.g.
{-# LANGUAGE TupleSections #-}
parseSuit :: String -> Maybe Suit
parseSuit "S" = Just S
...
parseSuit _ = Nothing
parseCard :: String -> Maybe (Card, Suit)
parseCard ('2' : s) = fmap (Two,) (parseSuit s)
...
parseCard _ = Nothing

I’d just prefix the numbers with a letter, or better yet, a word. I’d also not use too many one-letter abbreviations – H, K etc. are downright unreadable.
data Suit = Club | Spade | Heart | Diamond
data Card = Card1 | Card2 | … | Jack | Queen | King | Ace
… But I even prefer dave’s suggestion of using the number words (One, Two) for values instead.

How to create a type bounded within a certain range

I would like to create a new integral type which is bounded to a certain range. I have tried:
data PitchClass = PC Int deriving (Ord, Eq, Show)
instance Bounded PitchClass where
minBound = PC 0
maxBound = PC 11
However, what I want is something that will fail if something like
PC 12
or
PC (-1)
is attempted.
Is the general approach for a situation in which you wish to place constraints on creating new types one in which the value constructors are not exported from the module, but rather functions which return instances of the type and which perform constraint checks are exported?

Yes, not exporting the data constructor from the module is the way to go.
Instead, you export a function which does the checking as you said. This is often called a smart constructor.

An alternate solution for cases where the number of total values is this small is to simply enumerate the possible constructors.
data PitchClass = A | Bb | B | C | Db | D | Eb | E | F | Gb | G | Ab
deriving (Eq, Ord, Bounded, Show, Read)
There are half a dozen different hacks you can try from here to make it more convenient in various ways; for example, you can derive Enum to get toEnum . fromEnum = id (and toEnum (-1) = {- an exception -}), or you can write a custom Integral instance to get 0 = A (and your choice of behavior for -1).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string