Type casting when working with nested data structures - haskell

I have the following data structures defined:
data Operator = Plus | Times | Minus deriving (Eq,Show)
data Variable = A | B | C deriving (Eq,Show)
newtype Const = D Numeral deriving (Eq,Show)
data CVO = Const | Variable | Operator deriving (Eq,Show)
type Expr = [CVO]
I have defined the following function:
eval2 :: Expr -> Integer
eval2 x = helper x
I would like to check if an element of the CVO list (Expr) is either an instance of Const, Variable or Operator (this works) and I would like to implement varying code for the specific type of the instance (e.g. Plus, Times, Minus for Operator).
helper :: Expr -> Integer
helper [] = 2
helper (x:xs)
| x == Operator && x == Plus = 1
I cannot compare x to Plus, because it expects x to be of type CVO.
Couldn't match expected type ‘CVO’ with actual type ‘Operator’
Is it somehow possible to cast x to be an instance of Operator in order to do the comparison?

A value can't have two different types at the same time. If x is a CVO you can't use == to compare it to Plus which is an Operator.
At the moment the type CVO consists of three constant values called Const, Variable and Operator. I'm guessing you actually wanted it to contain values of the type Const, Variable or Operator. You do that by declaring arguments to the constructors.
data CVO = Const Const -- a constructor whose name is Const and contains a value of type Const
| Var Variable -- a constructor named Var containing a Variable
| Op Operator -- a constructor named Op containing an Operator
A given value of type CVO must have been built from one of those three constructors, containing a value of the correct type. You can test which constructor was used to create the CVO, and simultaneously unpack the value, using pattern matching. Something like this:
helper :: Expr -> Integer
helper [] = 0
helper (Op o:xs) -- because we matched Op, we know o :: Operator
| o == Plus = 1
| otherwise = 2
helper _ = 3

Related

Clarifying Data Constructor in Haskell

In the following:
data DataType a = Data a | Datum
I understand that Data Constructor are value level function. What we do above is defining their type. They can be function of multiple arity or const. That's fine. I'm ok with saying Datum construct Datum. What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce. Please let me know if i am getting it well:
1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name) ?
2 - b) So if i unpack and make an analogy. With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. Finally The type of that Structure of object is given by the Type constructor. An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Is my understanding correct ?
3 - c) Can I formally Say:
Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site.
Data Constructor that takes an argument represent class of values, where class is a Tag ? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site.
Another way of writing this:
data DataType a = Data a | Datum
Is with generalised algebraic data type (GADT) syntax, using the GADTSyntax extension, which lets us specify the types of the constructors explicitly:
{-# LANGUAGE GADTSyntax #-}
data DataType a where
Data :: a -> DataType a
Datum :: DataType a
(The GADTs extension would work too; it would also allow us to specify constructors with different type arguments in the result, like DataType Int vs. DataType Bool, but that’s a more advanced topic, and we don’t need that functionality here.)
These are exactly the types you would see in GHCi if you asked for the types of the constructor functions with :type / :t:
> :{
| data DataType a where
| Data :: a -> DataType a
| Datum :: DataType a
| :}
> :type Data
Data :: a -> DataType a
> :t Datum
Datum :: DataType a
With ExplicitForAll we can also specify the scope of the type variables explicitly, and make it clearer that the a in the data definition is a separate variable from the a in the constructor definitions by also giving them different names:
data DataType a where
Data :: forall b. b -> DataType b
Datum :: forall c. DataType c
Some more examples of this notation with standard prelude types:
data Either a b where
Left :: forall a b. a -> Either a b
Right :: forall a b. b -> Either a b
data Maybe a where
Nothing :: Maybe a
Just :: a -> Maybe a
data Bool where
False :: Bool
True :: Bool
data Ordering where
LT, EQ, GT :: Ordering -- Shorthand for repeated ‘:: Ordering’
I understand that Data Constructor are value level function. What we do above is defining their type. They can be function of multiple arity or const. That's fine. I'm ok with saying Datum construct Datum. What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce.
Datum and Data are both “constructors” of DataType a values; neither Datum nor Data is a type! These are just “tags” that select between the possible varieties of a DataType a value.
What is produced is always a value of type DataType a for a given a; the constructor selects which “shape” it takes.
A rough analogue of this is a union in languages like C or C++, plus an enumeration for the “tag”. In pseudocode:
enum Tag {
DataTag,
DatumTag,
}
// A single anonymous field.
struct DataFields<A> {
A field1;
}
// No fields.
struct DatumFields<A> {};
// A union of the possible field types.
union Fields<A> {
DataFields<A> data;
DatumFields<A> datum;
}
// A pair of a tag with the fields for that tag.
struct DataType<A> {
Tag tag;
Fields<A> fields;
}
The constructors are then just functions returning a value with the appropriate tag and fields. Pseudocode:
<A> DataType<A> newData(A x) {
DataType<A> result;
result.tag = DataTag;
result.fields.data.field1 = x;
return result;
}
<A> DataType<A> newDatum() {
DataType<A> result;
result.tag = DatumTag;
// No fields.
return result;
}
Unions are unsafe, since the tag and fields can get out of sync, but sum types are safe because they couple these together.
A pattern-match like this in Haskell:
case someDT of
Datum -> f
Data x -> g x
Is a combination of testing the tag and extracting the fields. Again, in pseudocode:
if (someDT.tag == DatumTag) {
f();
} else if (someDT.tag == DataTag) {
var x = someDT.fields.data.field1;
g(x);
}
Again this is coupled in Haskell to ensure that you can only ever access the fields if you have checked the tag by pattern-matching.
So, in answer to your questions:
1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name) ?
Data a in your original code is not defining a data structure, in that Data is not a separate type from DataType a, it’s just one of the possible tags that a DataType a value may have. Internally, a value of type DataType Int is one of the following:
The tag for Data (in GHC, a pointer to an “info table” for the constructor), and a reference to a value of type Int.
x = Data (1 :: Int) :: DataType Int
+----------+----------------+ +---------+----------------+
x ---->| Data tag | pointer to Int |---->| Int tag | unboxed Int# 1 |
+----------+----------------+ +---------+----------------+
The tag for Datum, and no other fields.
y = Datum :: DataType Int
+-----------+
y ----> | Datum tag |
+-----------+
In a language with unions, the size of a union is the maximum of all its alternatives, since the type must support representing any of the alternatives with mutation. In Haskell, since values are immutable, they don’t require any extra “padding” since they can’t be changed.
It’s a similar situation for standard data types, e.g., a product or sum type:
(x :: X, y :: Y) :: (X, Y)
+---------+--------------+--------------+
| (,) tag | pointer to X | pointer to Y |
+---------+--------------+--------------+
Left (m :: M) :: Either M N
+-----------+--------------+
| Left tag | pointer to M |
+-----------+--------------+
Right (n :: N) :: Either M N
+-----------+--------------+
| Right tag | pointer to N |
+-----------+--------------+
2 - b) So if i unpack and make an analogy. With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. Finally The type of that Structure of object is given by the Type constructor. An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Is my understanding correct ?
This is sort of correct, but again, the constructors Data and Datum aren’t “data structures” by themselves. They’re just the names used to introduce (construct) and eliminate (match) values of type DataType a, for some type a that is chosen by the caller of the constructors to fill in the forall
data DataType a = Data a | Datum says:
If some term e has type T, then the term Data e has type DataType T
Inversely, if some value of type DataType T matches the pattern Data x, then x has type T in the scope of the match (case branch or function equation)
The term Datum has type DataType T for any type T
3 - c) Can I formally Say:
Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site.
Data Constructor that takes an argument represent class of values, where class is a Tag ? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site.
Not exactly. A type constructor like DataType :: Type -> Type, Maybe :: Type -> Type, or Either :: Type -> Type -> Type, or [] :: Type -> Type (list), or a polymorphic data type, represents an “infinite” family of concrete types (Maybe Int, Maybe Char, Maybe (String -> String), …) but only in the same way that id :: forall a. a -> a represents an “infinite” family of functions (id :: Int -> Int, id :: Char -> Char, id :: String -> String, …).
That is, the type a here is a parameter filled in with an argument value given by the caller. Usually this is implicit, through type inference, but you can specify it explicitly with the TypeApplications extension:
-- Akin to: \ (a :: Type) -> \ (x :: a) -> x
id :: forall a. a -> a
id x = x
id #Int :: Int -> Int
id #Int 1 :: Int
Data :: forall a. a -> DataType a
Data #Char :: Char -> DataType Char
Data #Char 'x' :: DataType Char
The data constructors of each instantiation don’t really have anything to do with each other. There’s nothing in common between the instantiations Data :: Int -> DataType Int and Data :: Char -> DataType Char, apart from the fact that they share the same tag name.
Another way of thinking about this in Java terms is with the visitor pattern. DataType would be represented as a function that accepts a “DataType visitor”, and then the constructors don’t correspond to separate data types, they’re just the methods of the visitor which accept the fields and return some result. Writing the equivalent code in Java is a worthwhile exercise, but here it is in Haskell:
{-# LANGUAGE RankNTypes #-}
-- (Allows passing polymorphic functions as arguments.)
type DataType a
= forall r. -- A visitor with a generic result type
r -- With one “method” for the ‘Datum’ case (no fields)
-> (a -> r) -- And one for the ‘Data’ case (one field)
-> r -- Returning the result
newData :: a -> DataType a
newData field = \ _visitDatum visitData -> visitData field
newDatum :: DataType a
newDatum = \ visitDatum _visitData -> visitDatum
Pattern-matching is simply running the visitor:
matchDT :: DataType a -> b -> (a -> b) -> b
matchDT dt visitDatum visitData = dt visitDatum visitData
-- Or: matchDT dt = dt
-- Or: matchDT = id
-- case someDT of { Datum -> f; Data x -> g x }
-- f :: r
-- g :: a -> r
-- someDT :: DataType a
-- :: forall r. r -> (a -> r) -> r
someDT f (\ x -> g x)
Similarly, in Haskell, data constructors are just the ways of introducing and eliminating values of a user-defined type.
What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce
I'm having trouble following your question, but I think you are complicating things. I would suggest not thinking too deeply about the "constructor" terminology.
But hopefully the following helps:
Starting simple:
data DataType = Data Int | Datum
The above reads "Declare a new type named DataType, which has the possible values Datum or Data <some_number> (e.g. Data 42)"
So e.g. Datum is a value of type DataType.
Going back to your example with a type parameter, I want to point out what the syntax is doing:
data DataType a = Data a | Datum
^ ^ ^ These things appear in type signatures (type level)
^ ^ These things appear in code (value level stuff)
There's a bit of punning happening here. so in the data declaration you might see "Data Int" and this is mixing type-level and value-level stuff in a way that you wouldn't see in code. In code you'd see e.g. Data 42 or Data someVal.
I hope that helps a little...

Is there a better way to add attribute field into AST in Haskell?

At first, I have a original AST definition like this:
data Expr = LitI Int | LitB Bool | Add Expr Expr
And I want to generalize it so that each AST node can contains some extra attributes:
data Expr a = LitI Int a | LitB Bool a | Add (Expr a) (Expr a) a
In this way, we can easily attach attribute into each node of the AST:
type ExprWithType = Expr TypeRep
type ExprWithSize = Expr Int
But this solution makes it hard to visit the attribute field, we must use pattern matching and process it case by case:
attribute :: Expr a -> a
attribute e = case e of
LitI _ a -> a
LitB _ a -> a
Add _ _ a -> a
We can image that if we can define our AST via a product type of the original AST and the type variable indicating attribute:
type ExprWithType = (Expr, TypeRep)
type ExprWithSize = (Expr, Int)
Then we can simplify the attribute visiting function like this:
attribute = snd
But we know that, the attribute from the outmost product type will not recursively appears in the subtrees.
So, is there a better solution for this problem?
Generally speaking, When we want to extract common field of different cases of a recursive sum type, we met this problem.
You could "lift" the type of the Expr for example like:
data Expr e = LitI Int | LitB Bool | Add e e
Now we can define a data type like:
data ExprAttr a = ExprAttr {
expression :: Expr (ExprAttr a),
attribute :: a
}
So here the ExprAttr has two parameters, the expression, which is thus an Expression that has ExprAttr as in the tree, and attribute which is an a.
You can thus process ExprAttrs, which is an AST of ExprAttrs. If you want to use a "simple" AST, you can define a type like:
newtype SimExpr = SimExpr (Expr SimExpr)
You might want to take a look at Cofree where f will be your recursive data type after abstracting the concept of recursion as an f-algebra and a will be the type of your annotation.
Nate Faubion gave a very accesible talk about this and similar approaches and you can watch it here: https://www.youtube.com/watch?v=eKkxmVFcd74

How to check for the argument of a function and pass it on?

I'm trying to code a function that takes one of my own commands and its argument if it has got one and passes on the argument only. Of course "int" is not in scope, but if I try to pass it on with x (like "f (x int)") I get a parse error. I'm new to Haskell and grateful for any advice.
data Command = PushC Int | Pop | Push Int
f :: Command -> Int
f x | x == PushC int = int
| x == Pop = 1
| x == Push int = int
| otherwise = -1
I just want to check for any Integer as argument of the command and have it as output. If the command doesen't have an argument output 1.
You really should use pattern matching [Haskell-wiki] here:
f :: Command -> Int
f (PushC n) = n
f (Push n) = n
f Pop = 1
Pattern matching is more effective than using (==) :: Eq a => a -> a -> Bool, first of all, not all types are an instance of Eq, and furthermore you can not match parameters of the data constructor with Eq. You can thus not match some_val == PushC num, and hope that Haskell assigns a value to num, like Prolog does.
Usually (==) and (/=) are only used to check if two values are equal, not to discriminate between the possible data constructors and parameters of a type.
You could use a wild card (_) as last clause here, but it is probably better not to do that: if you later add an extra Command data constructor, the compiler can warn you that not all patterns are covered, and thus you can provide the correct expression. If on the other hand, you use a wildcard, then that clause will automatically be matched with that data constructor, and this might not be the intended effect.

How to create a data type containing a list

I'm trying to create a data type class that contains a list:
data Test = [Int] deriving(Show)
But Haskell can't parse the constructor. What am i doing wrong here and how can I best achieve what I'm trying to do?
An Answer
You need to include a constructor, which you haven't done.
data Test = Test [Int]
Consider reviewing Haskell's several type declarations, their use, and their syntax.
Haskell Type Declarations
data
Allows declaration of zero or more constructors each with zero or more fields.
newtype
Allows declaration of one constructor with one field. This field is strict.
type
Allows creation of a type alias which can be textually interchanged with the type to the right of the equals at any use.
constructor
Allows creation of a value of the declared type. Also allows decomposition of values to obtain the individual fields (via pattern matching)
Examples
data
data Options = OptionA Int | OptionB Integer | OptionC PuppyAges
^ ^ ^ ^ ^ ^ ^
| | Field | | | |
type Constructor | | Constructor |
Constructor Field Field
deriving (Show)
myPuppies = OptionB 1234
newtype
newtype PuppyAges = AgeList [Int] deriving (Show)
myPuppies :: PuppyAges
myPuppies = AgeList [1,2,3,4]
Because the types (PuppyAges) and the constructors (AgeList) are in different namespaces people can and often do use the same name for each, such as newtype X = X [Int].
type
type IntList = [Int]
thisIsThat :: IntList -> [Int]
thisIsThat x = x
constructors (more)
option_is_a_puppy_age :: Options -> Bool
option_is_a_puppy_age (OptionC _) = True
option_is_a_puppy_age () = False
option_has_field_of_value :: Options -> Int -> Bool
option_has_field_of_value (OptionA a) x = x == a
option_has_field_of_value (OptionB b) x = fromIntegral x == b
option_has_field_of_value (OptionC (AgeList cs)) x = x `elem` cs
increment_option_a :: Options -> Options
increment_option_a (OptionA a) = OptionA (a+1)
increment_option_a x = x

Create type with double symbol

The List type is created with
data [] a = [] | a : [a]
But I can't create my own type with the same structure:
data %% a = %% | a : %a%
error: parse error on input `%%'
The List type is created with
data [] a = [] | a : [a]
No, it isn't. If you look at the source (for GHC; other compilers may do it differently), it says
data [] a = MkNil
but this is just a marker for the compiler (not even this, see chepner's comment). This is because
data [] a = [] | a : [a]
isn't legal syntax in Haskell.
What is true is that list works as if it were defined this way: it's entirely equivalent to
data List a = Nil | Cons a (List a)
except for the names.
Type and constructor names must either be alphanumeric names, starting with uppercase
data MyType a b = K a | L b a
or be symbolic infix operators, starting with :
data a :** b = K a | b :+-& a
Both types above are perfectly isomorphic: we only replaced MyType with the infix :** and L with the infix :+-&.
Also note that infixes must be binary, i.e. take two arguments. Alphanumeric names do not have such constraint (e.g. K above only takes one argument).
List syntax [] is specially handled by the compiler, similarly to (,),(,,),... for tuples. Only : follows the general rule (perhaps incidentally).

Resources