haskell - Cannot parse data constructor in a data/newtype declaration: [Either Int Int] - haskell

I'm trying to declare a datatype that is a list of Either types.
data EitherInts = [Either Int Int]
But when I try to compile this type I get an error:
Cannot parse data constructor in a data/newtype declaration: [Either Int Int]
I have no idea why. What am I doing wrong?

data is for defining new algebraic data types, which must each have their own constructor(s). So you could write
data EitherInts = EitherInts [Either Int Int]
But you probably don't mean this: you want some kind of type synonym. There are two possibilities:
type EitherIntsType = [Either Int Int]
or
newtype EitherIntsNewtype = EitherInts [Either Int Int]
The first, using type, acts exactly like [Either Int Int]: a value such as [Left 2] is a valid value of the new EitherIntsType type. It's just a new, abbreviated name for that existing type, and can be used interchangeably with it. This approach is fine when you just want to be able to use a different name for an existing type: for clarity, or to give a shorter name to a much longer type.
The second, using newtype, is more heavy-weight. It acts more like data, with its own constructor. Values like [Left 2] are not valid values of the new EitherIntsNewtype type. Instead, you must write EitherintsNewtype [Left 2] to lift an existing list value, and you must pattern-match to extract values. This approach is better when you want the compiler's help making sure you don't mix up lifted and unlifted values, or when you want something like an existing type but with different typeclass instances (because you can give your newtype its own typeclass instances, which a type can't have).

Related

Haskell - Bags - How can I use polymorphism in Haskell?

I have just started learning Haskell and still haven't grasped Functional Programming. I need to create a polymorphic datatype whose type I don't know until one of the functions I've written is run. The program seems to want me to build a list of tuples out of a list, e.g.:
['Car', 'Car', 'Motorcycle', 'Motorcycle', 'Motorcycle', 'Truck'] would be converted to [('Car', 2), ('Motorcycle', 3), ('Truck', 1)].
Within a same list of tuples (a bag), all elements will be of the same type, but different bags may contain other types. Right now, my datatype declaration (I'm not sure if it's called 'declaration' in FP) goes:
type Amount = Int
data Bag a = [(a, Amount)]
However, when I try to load the module, I get this error:
Cannot parse data constructor in a data/newtype declaration: [(a, Amount)]
If I change data to type in the declaration, I get this error message for all functions:
Expecting one more argument to ‘Bag’
Expected a type, but ‘Bag’ has kind ‘* -> *’
Is there something I'm not grasping about FP or is it a code error?, and more importantly, how can I declare this in a way that actually allows me to load the module into GHCi?
Defining a data type
This is not about functional programming itself. If you define a datatype (or newtype), in Haskell it needs a data constructor (for newtype there can only be one data constructor, and with one parameter). [(a, Amount)] is however not a good "name" for a data constructor (well you did not intend to use that as a data constructor anyway).
We can here write a data constructor like:
data Bag a = Bag [(a, Amount)]
and since here Bag contains (likely) one data constructor with one parameter, we can make it a newtype:
newtype Bag a = Bag [(a, Amount)]
The above however might not be necessary: you might want to declare a type alias with type:
type Bag a = [(a, Amount)]
in that case, you did not construct a new type, but you can write Bag a, and "behind the curtains", Haskell will replace this with [(a, Amount)].
Define functions with Bag
In case you now want to define a function that processes a Bag, you will need to specify the parameter a in the signature as well, for example:
count :: Eq a => [a] -> Bag a
count = -- ...
Now it is clear that we transform a list of as, in a Bag of as.

Invalid data type

The following code is invalid:
data A = Int | [Int] | (Int, Int)
Why is it ok to use the concrete type Int as part of a data type definition, but not the concrete type [Int] or (Int, Int)?
Why is it ok to use the concrete type Int as part of a data type definition (..)
It is not OK.
What you here have written is the definition of a data type A that has a constructor named Int. This has nothing to do with the data type Int, it is simply a coincidence that the name of the constructor is the same as the name of a type, but that is not a problem for the Haskell compiler, since Haskell has a clear distinction between types and constructor names.
You can not use [Int] however, since [Int] is not an identifier (it starts with an open square bracket), nor is it an operator (that can only use symbols). So Haskell does not really knows how to deal with this and errors on this input.
If you want to define a datatype that can take an Int value, you need to add it as a parameter. You can also define constructors for your [Int] and (Int, Int) parameters. For instance:
data A = Int Int | Ints [Int] | Int2 (Int,Int)
So here there are three constructors: Int, Ints, and Int2. And the first constructor takes an Int as parameter, the second one an [Int], and the last one an (Int, Int).
That being said, this will probably result into a lot of confusion, so it is better to use constructor names that cause less confusion, for instance:
data A = A Int | As [Int] | A2 (Int,Int)
Note that the A of data A can be used in the signature of the functions, whereas the constructors (in boldcase) are used as values (so in the implementation of the function, that is the pattern matching in the head of the clauses, and in order to construct a value in the body of the clauses).

Redundancy regarding product types and tuples in Haskell

In Haskell you have product types and you have tuples.
You use tuples if you don't want to associate a dedicated type with the value, and you can use product types if you wish to do so.
However I feel there is redundancy in the notation of product types
data Foo = Foo (String, Int, Char)
data Bar = Bar String Int Char
Why are there both kinds of notations? Is there any case where you would prefer one the other?
I guess you can't use record notation when using tuples, but that's just a convenience problem. Another thing might be the notion of order in tuples, as opposed to product types, but I think that's just due to the naming of the functions fst and snd.
#chi's answer is about the technical differences in terms of Haskell's evaluation model. I hope to give you some insight into the philosophy of this sort of typed programming.
In category theory we generally work with objects "up to isomorphism". Your Bar is of course isomorphic to (String, Int, Char), so from a categorical perspective they're the same thing.
bar_tuple :: Iso' Bar (String, Int, Char)
bar_tuple = iso to from
where to (Bar s i c) = (s, i, c)
from (s, i, c) = Bar s i c
In some sense tuples are a Platonic form of product type, in that they have no meaning beyond being a collection of disparate values. All the other product types can be mapped to and from a plain old tuple.
So why not use tuples everywhere, when all Haskell types ultimately boil down to a sum of products? It's about communication. As Martin Fowler says,
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
Names are important! Writing down a custom product type like
data Customer = Customer { name :: String, address :: String }
imbues the type Customer with meaning to the person reading the code, unlike (String, String) which just means "two strings".
Custom types are particularly useful when you want to enforce invariants by hiding the representation of your data and using smart constructors:
newtype NonEmpty a = NonEmpty [a]
nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty [] = Nothing
nonEmpty xs = Just (NonEmpty xs)
Now, if you don't export the NonEmpty constructor, you can force people to go through the nonEmpty smart constructor. If someone hands you a NonEmpty value you may safely assume that it has at least one element.
You can of course represent Customer as a tuple under the hood and expose evocatively-named field accessors,
newtype Customer = Bar (String, String)
name, address :: Customer -> String
name (Customer (n, a)) = n
address (Customer (n, a)) = a
but this doesn't really buy you much, except that it's now cheaper to convert Customer to a tuple (if, say, you're writing performance-sensitive code that works with a tuple-oriented API).
If your code is intended to solve a particular problem - which of course is the whole point of writing code - it pays to not just solve the problem, but make it look like you've solved it too. Someone - maybe you in a couple of years - is going to have to read this code and understand it with no a priori knowledge of how it works. Custom types are a very important communication tool in this regard.
The type
data Foo = Foo (String, Int, Char)
represents a double-lifted tuple. It values comprise
undefined
Foo undefined
Foo (undefined, undefined, undefined)
etc.
This is usually troublesome. Because of this, it's rare to see such definitions in actual code. We either have plain data types
data Foo = Foo String Int Char
or newtypes
newtype Foo = Foo (String, Int, Char)
The newtype can be just as inconvenient to use, but at least it
does not double-lift the tuple: undefined and Foo undefined are now equal values.
The newtype also provides zero-cost conversion between a plain tuple and Foo, in both directions.
You can see such newtypes in use e.g. when the programmer needs a different instance for some type class, than the one already associated with the tuple. Or, perhaps, it is used in a "smart constructor" idiom.
I would not expect the pattern used in Foo to be frequent. There is slight difference in what the constructor acts like: Foo :: (String, Int, Char) -> Foo as opposed to Bar :: String -> Int -> Char -> Bar. Then Foo undefined and Foo (undefined, ..., ...) are strictly speaking different things, whereas you miss one level of undefinedness in Bar.

Making a type polymorphic

So i'm defining a type which a list of tuples basically and I can't work out how to make it polymorphic.
so far i've got
module ListTup where
type ListTup = [(Char, String)]
and I was wondering if it was possible to make it so that the Char part could be anything e.i String, int what ever.
Is it possible?
I tried to use the Maybe Type but it throw a ton of errors my way
You can include type variables when defining type synonyms, like so:
type ListTup a = [(a, String)].
It depends on what you want to do. If you want different lists each of which will have the same type in the tuple's first element, you can parametrise the type constructor, like C.Quilley suggested above.
If you want each of your lists to be able to have different types, you can box all required types in an algebraic type (discriminated union):
data MyKey = MyCharKey Char | MyStringKey String | MyIntKey Int
type ListTup = [(MyKey, String)]
You cannot have "whatever", because the types need to be decidable at compilation time.

Why do I have to use newtype when my data type declaration only has one constructor? [duplicate]

This question already has answers here:
Difference between `data` and `newtype` in Haskell
(2 answers)
Closed 8 years ago.
It seems that a newtype definition is just a data definition that obeys some restrictions (e.g., only one constructor), and that due to these restrictions the runtime system can handle newtypes more efficiently. And the handling of pattern matching for undefined values is slightly different.
But suppose Haskell would only knew data definitions, no newtypes: couldn't the compiler find out for itself whether a given data definition obeys these restrictions, and automatically treat it more efficiently?
I'm sure I'm missing out on something, there must be some deeper reason for this.
Both newtype and the single-constructor data introduce a single value constructor, but the value constructor introduced by newtype is strict and the value constructor introduced by data is lazy. So if you have
data D = D Int
newtype N = N Int
Then N undefined is equivalent to undefined and causes an error when evaluated. But D undefined is not equivalent to undefined, and it can be evaluated as long as you don't try to peek inside.
Couldn't the compiler handle this for itself.
No, not really—this is a case where as the programmer you get to decide whether the constructor is strict or lazy. To understand when and how to make constructors strict or lazy, you have to have a much better understanding of lazy evaluation than I do. I stick to the idea in the Report, namely that newtype is there for you to rename an existing type, like having several different incompatible kinds of measurements:
newtype Feet = Feet Double
newtype Cm = Cm Double
both behave exactly like Double at run time, but the compiler promises not to let you confuse them.
According to Learn You a Haskell:
Instead of the data keyword, the newtype keyword is used. Now why is
that? Well for one, newtype is faster. If you use the data keyword to
wrap a type, there's some overhead to all that wrapping and unwrapping
when your program is running. But if you use newtype, Haskell knows
that you're just using it to wrap an existing type into a new type
(hence the name), because you want it to be the same internally but
have a different type. With that in mind, Haskell can get rid of the
wrapping and unwrapping once it resolves which value is of what type.
So why not just use newtype all the time instead of data then? Well,
when you make a new type from an existing type by using the newtype
keyword, you can only have one value constructor and that value
constructor can only have one field. But with data, you can make data
types that have several value constructors and each constructor can
have zero or more fields:
data Profession = Fighter | Archer | Accountant
data Race = Human | Elf | Orc | Goblin
data PlayerCharacter = PlayerCharacter Race Profession
When using newtype, you're restricted to just one constructor with one
field.
Now consider the following type:
data CoolBool = CoolBool { getCoolBool :: Bool }
It's your run-of-the-mill algebraic data type that was defined with
the data keyword. It has one value constructor, which has one field
whose type is Bool. Let's make a function that pattern matches on a
CoolBool and returns the value "hello" regardless of whether the Bool
inside the CoolBool was True or False:
helloMe :: CoolBool -> String
helloMe (CoolBool _) = "hello"
Instead of applying this function to a normal CoolBool, let's throw it a curveball and apply it to undefined!
ghci> helloMe undefined
"*** Exception: Prelude.undefined
Yikes! An exception! Now why did this exception happen? Types defined
with the data keyword can have multiple value constructors (even
though CoolBool only has one). So in order to see if the value given
to our function conforms to the (CoolBool _) pattern, Haskell has to
evaluate the value just enough to see which value constructor was used
when we made the value. And when we try to evaluate an undefined
value, even a little, an exception is thrown.
Instead of using the data keyword for CoolBool, let's try using
newtype:
newtype CoolBool = CoolBool { getCoolBool :: Bool }
We don't have to
change our helloMe function, because the pattern matching syntax is
the same if you use newtype or data to define your type. Let's do the
same thing here and apply helloMe to an undefined value:
ghci> helloMe undefined
"hello"
It worked! Hmmm, why is that? Well, like we've said, when we use
newtype, Haskell can internally represent the values of the new type
in the same way as the original values. It doesn't have to add another
box around them, it just has to be aware of the values being of
different types. And because Haskell knows that types made with the
newtype keyword can only have one constructor, it doesn't have to
evaluate the value passed to the function to make sure that it
conforms to the (CoolBool _) pattern because newtype types can only
have one possible value constructor and one field!
This difference in behavior may seem trivial, but it's actually pretty
important because it helps us realize that even though types defined
with data and newtype behave similarly from the programmer's point of
view because they both have value constructors and fields, they are
actually two different mechanisms. Whereas data can be used to make
your own types from scratch, newtype is for making a completely new
type out of an existing type. Pattern matching on newtype values isn't
like taking something out of a box (like it is with data), it's more
about making a direct conversion from one type to another.
Here's another source. According to this Newtype article:
A newtype declaration creates a new type in much the same way as data.
The syntax and usage of newtypes is virtually identical to that of
data declarations - in fact, you can replace the newtype keyword with
data and it'll still compile, indeed there's even a good chance your
program will still work. The converse is not true, however - data can
only be replaced with newtype if the type has exactly one constructor
with exactly one field inside it.
Some Examples:
newtype Fd = Fd CInt
-- data Fd = Fd CInt would also be valid
-- newtypes can have deriving clauses just like normal types
newtype Identity a = Identity a
deriving (Eq, Ord, Read, Show)
-- record syntax is still allowed, but only for one field
newtype State s a = State { runState :: s -> (s, a) }
-- this is *not* allowed:
-- newtype Pair a b = Pair { pairFst :: a, pairSnd :: b }
-- but this is:
data Pair a b = Pair { pairFst :: a, pairSnd :: b }
-- and so is this:
newtype Pair' a b = Pair' (a, b)
Sounds pretty limited! So why does anyone use newtype?
The short version The restriction to one constructor with one field
means that the new type and the type of the field are in direct
correspondence:
State :: (s -> (a, s)) -> State s a
runState :: State s a -> (s -> (a, s))
or in mathematical terms they are isomorphic. This means that after
the type is checked at compile time, at run time the two types can be
treated essentially the same, without the overhead or indirection
normally associated with a data constructor. So if you want to declare
different type class instances for a particular type, or want to make
a type abstract, you can wrap it in a newtype and it'll be considered
distinct to the type-checker, but identical at runtime. You can then
use all sorts of deep trickery like phantom or recursive types without
worrying about GHC shuffling buckets of bytes for no reason.
See the article for the messy bits...
Simple version for folks obsessed with bullet lists (failed to find one, so have to write it by myself):
data - creates new algebraic type with value constructors
Can have several value constructors
Value constructors are lazy
Values can have several fields
Affects both compilation and runtime, have runtime overhead
Created type is a distinct new type
Can have its own type class instances
When pattern matching against value constructors, WILL be evaluated at least to weak head normal form (WHNF) *
Used to create new data type (example: Address { zip :: String, street :: String } )
newtype - creates new “decorating” type with value constructor
Can have only one value constructor
Value constructor is strict
Value can have only one field
Affects only compilation, no runtime overhead
Created type is a distinct new type
Can have its own type class instances
When pattern matching against value constructor, CAN be not evaluated at all *
Used to create higher level concept based on existing type with distinct set of supported operations or that is not interchangeable with original type (example: Meter, Cm, Feet is Double)
type - creates an alternative name (synonym) for a type (like typedef in C)
No value constructors
No fields
Affects only compilation, no runtime overhead
No new type is created (only a new name for existing type)
Can NOT have its own type class instances
When pattern matching against data constructor, behaves the same as original type
Used to create higher level concept based on existing type with the same set of supported operations (example: String is [Char])
[*] On pattern matching laziness:
data DataBox a = DataBox Int
newtype NewtypeBox a = NewtypeBox Int
dataMatcher :: DataBox -> String
dataMatcher (DataBox _) = "data"
newtypeMatcher :: NewtypeBox -> String
newtypeMatcher (NewtypeBox _) = "newtype"
ghci> dataMatcher undefined
"*** Exception: Prelude.undefined
ghci> newtypeMatcher undefined
“newtype"
Off the top of my head; data declarations use lazy evaluation in access and storage of their "members", whereas newtype does not. Newtype also strips away all previous type instances from its components, effectively hiding its implementation; whereas data leaves the implementation open.
I tend to use newtype's when avoiding boilerplate code in complex data types where I don't necessarily need access to the internals when using them. This speeds up both compilation and execution, and reduces code complexity where the new type is used.
When first reading about this I found this chapter of a Gentle Introduction to Haskell rather intuitive.

Resources