Haskell : recursive data type (Parameterized Types) - haskell

I have this :
data Val s i a = S s | I i | A a deriving (Show)
To play with non-homogenous lists in Haskell. So I can do something like (just an example function ):
oneDown :: Val String Int [String]-> Either String String
But I would actually like this to be a list of Vals, i.e. something like :
oneDown :: Val String Int [Val]-> Either String String

What you're looking for would result in an infinite data type, which Haskell explicitly disallows. However, we can hide this infinity behind a newtype and the compiler won't complain.
data Val s i a = S s | I i | A a deriving (Show)
newtype Val' = Val' (Val String Int [Val']) deriving (Show)
It's still doing exactly what your example did (plus a few type constructors that will get optimized away at runtime), but now we can infinitely recurse because we've guarded the recursive type.
This is actually what the recursion-schemes library does to get inductively-defined data that we can define generic recursion techniques on. If you're interested in generalized data types like this, you may have a look at that library.
To construct this newly-made type, we have to use the Val' constructor.
let myVal = A [Val' (I 3), Val' (S "ABC"), Val' (A [])]

Related

Type design for the AST of my language remembering token locations

I wrote a parser and evaluator for a simple programming language. Here is a simplified version of the types for the AST:
data Value = IntV Int | FloatV Float | BoolV Bool
data Expr = IfE Value [Expr] | VarDefE String Value
type Program = [Expr]
I want error messages to tell the line and column of the source code in which the error occured. For example, if the value in an If expression is not a boolean, I want the evaluator to show an error saying "expected boolean at line x, column y", with x and y referring to the location of the value.
So, what I need to do is redefine the previous types so that they can store the relevant locations of different things. One option would be to add a location to each constructor for expressions, like so:
type Location = (Int, Int)
data Expr = IfE Value [Expr] Location | VarDef String Value Location
This clearly isn't optimal, because I have to add those Location fields to every possible expression, and if for example a value contained other values, I would need to add locations to that value too:
{-
this would turn into FunctionCall String [Value] [Location],
with one location for each value in the function call
-}
data Value = ... | FunctionCall String [Value]
I came up with another solution, which allows me to add locations to everything:
data Located a = Located Location a
type LocatedExpr = Located Expr
type LocatedValue = Located Value
data Value = IntV Int | FloatV Float | BoolV Bool | FunctionCall String [LocatedValue]
data Expr = IfE LocatedValue [LocatedExpr] | VarDef String LocatedValue
data Program = [LocatedExpr]
However I don't like this that much. First of all, it clutters the definition of the evaluator and pattern matching has an extra layer every time. Also, I don't think saying that a function call takes located values as arguments is quite right. Function calls should take values as arguments, and locations should be metadata that doesn't interfere with the evaluator.
I need help redefining my types so that the solution is as clean as possible. Maybe there is a language extension or a design pattern I don't know about that could be helpful.
There are many ways to annotate an AST! This is half of what’s known as the AST typing problem, the other half being how you manage an AST that changes over the course of compilation. The problem isn’t exactly “solved”: all of the solutions have tradeoffs, and which one to pick depends on your expected use cases. I’ll go over a few that you might like to investigate at the end.
Whichever method you choose for organising the actual data types, if it makes pattern-matching ugly or unwieldy, the natural solution is PatternSynonyms.
Considering your first example:
{-# Language PatternSynonyms #-}
type Location = (Int, Int)
data Expr
= LocatedIf Value [Expr] Location
| LocatedVarDef String Value Location
-- Unidirectional pattern synonyms which ignore the location:
pattern If :: Value -> [Expr] -> Expr
pattern If val exprs <- LocatedIf val exprs _loc
pattern VarDef :: String -> Value -> Expr
pattern VarDef name expr <- LocatedVarDef name expr _loc
-- Inform GHC that matching ‘If’ and ‘VarDef’ is just as good
-- as matching ‘LocatedIf’ and ‘LocatedVarDef’.
{-# Complete If, VarDef #-}
This may be sufficiently tidy for your purposes already. But here are a few more tips that I find helpful.
Put annotations first: when adding an annotation type to an AST directly, I often prefer to place it as the first parameter of each constructor, so that it can be conveniently partially applied.
data LocatedExpr
= LocatedIf Location Value [Expr]
| LocatedVarDef Location String Value
If the annotation is a location, then this also makes it more convenient to obtain when writing certain kinds of parsers, along the lines of AnnotatedIf <$> (getSourceLocation <* ifKeyword) <*> value <*> many expr in a parser combinator library.
Parameterise your annotations: I often make the annotation type into a type parameter, so that GHC can derive some useful classes for me:
{-# Language
DeriveFoldable,
DeriveFunctor,
DeriveTraversable #-}
data AnnotatedExpr a
= AnnotatedIf a Value [Expr]
| AnnotatedVarDef a String Value
deriving (Functor, Foldable, Traversable)
type LocatedExpr = AnnotatedExpr Location
-- Get the annotation of an expression.
-- (Total as long as every constructor is annotated.)
exprAnnotation :: AnnotatedExpr a -> a
exprAnnotation = head
-- Update annotations purely.
mapAnnotations
:: (a -> b)
-> AnnotatedExpr a -> AnnotatedExpr b
mapAnnotations = fmap
-- traverse, foldMap, &c.
If you want “doesn’t interfere”, use polymorphism: you can enforce that the evaluator can’t inspect the annotation type by being polymorphic over it. Pattern synonyms still let you match on these expressions conveniently:
pattern If :: Value -> [AnnotatedExpr a] -> AnnotatedExpr a
pattern If val exprs <- AnnotatedIf _anno val exprs
-- …
eval :: AnnotatedExpr a -> Value
eval expr = case expr of
If val exprs -> -- …
VarDef name expr -> -- …
Unannotated terms aren’t your enemy: a term without source locations is no good for error reporting, but I think it’s still a good idea to make the pattern synonyms bidirectional for the convenience of constructing unannotated terms with a unit () annotation. (Or something equivalent, if you use e.g. Maybe Location as the annotation type.)
The reason is that this is quite convenient for writing unit tests, where you want to check the output, but want to use Eq instead of pattern matching, and don’t want to have to compare all the source locations in tests that aren’t concerned with them. Using the derived classes, void :: (Functor f) => f a -> f () strips out all the annotations on an AST.
import Control.Monad (void)
type BareExpr = AnnotatedExpr ()
-- One way to define bidirectional synonyms, so e.g.
-- ‘If’ can be used as either a pattern or a constructor.
pattern If :: Value -> [BareExpr] -> BareExpr
pattern If val exprs = AnnotatedIf () val exprs
-- …
stripAnnotations :: AnnotatedExpr a -> BareExpr
stripAnnotations = void
Equivalently, you could use GADTs / ExistentialQuantification to say data AnyExpr where { AnyExpr :: AnnotatedExpr a -> AnyExpr } / data AnyExpr = forall a. AnyExpr (AnnotatedExpr a); that way, the annotations have exactly as much information as (), but you don’t need to fmap over the entire tree with void in order to strip it, just apply the AnyExpr constructor to hide the type.
Finally, here are some brief introductions to a few AST typing solutions.
Annotate each AST node with a tag (e.g. a unique ID), then store all metadata like source locations, types, and whatever else, separately from the AST:
import Data.IntMap (IntMap)
-- More sophisticated/stronglier-typed tags are possible.
newtype Tag = Tag Int
newtype TagMap a = TagMap (IntMap a)
data Expr
= If !Tag Value [Expr]
| VarDef !Tag String Expr
type Span = (Location, Location)
type SourceMap = TagMap Span
type CommentMap = TagMap (Span, String)
parse
:: String -- Input
-> Either ParseError
( Expr -- Parsed expression
, SourceMap -- Source locations of tags
, CommentMap -- Sideband for comments
-- …
)
The advantage is that you can very easily mix in arbitrary new types of annotations anywhere, without affecting the AST itself, and avoid rewriting the AST just to change annotations. You can think of the tree and annotation tables as a kind of database, where the tags are the “foreign keys” relating them. A downside is that you must be careful to maintain these tags when you do rewrite the AST.
I don’t know if this approach has an established name; I think of it as just “tagging” or a “tagged AST”.
recursion-schemes and/or Data Types à la CartePDF: separate out the “recursive” part of an annotated expression tree from the “annotation” part, and use Fix to tie them back together, with Compose (or Cofree) to add annotations in the middle.
data ExprF e
= IfF Value [e]
| VarDefF String e
-- …
deriving (Foldable, Functor, Traversable, …)
-- Unannotated: Expr ~ ExprF (ExprF (ExprF (…)))
type Expr = Fix ExprF
-- With a location at each recursive step:
--
-- LocatedExpr ~ Located (ExprF (Located (ExprF (…))))
type LocatedExpr = Fix (Compose Located ExprF)
data Located a = Located Location a
deriving (Foldable, Functor, Traversable, …)
-- or: type Located = (,) Location
A distinct advantage is that you get a bunch of nice traversal stuff like cata for free-ish, so you can avoid having to write manual traversals over your AST over and over. A downside is that it adds some pattern clutter to clean up, as does the “à la carte” approach, but they do offer a lot of flexibility.
Trees That GrowPDF is overkill for just source locations, but in a serious compiler it’s quite helpful. If you expect to have more than one annotation type (such as inferred types or other analysis results) or an AST that changes over time, then you add a type parameter for the “compilation phase” (parsed, renamed, typechecked, desugared, &c.) and select field types or enable & disable constructors based on that index.
A really unfortunate downside of this is that you often have to rewrite the tree even in places nothing has changed, because everything depends on the “phase”. An alternative that I use is to add one type parameter for each type of phase or annotation that can vary independently, e.g. data Expr annotation termVarName typeVarName, and abstract over that with type and pattern synonyms. This lets you update indices independently and still use classes like Functor and Bitraversable.

The limit set of types with new data like `Tree a`

Exploring and studing type system in Haskell I've found some problems.
1) Let's consider polymorphic type as Binary Tree:
data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving Show
And, for example, I want to limit my considerations only with Tree Int, Tree Bool and Tree Char. Of course, I can make a such new type:
data TreeIWant = T1 (Tree Int) | T2 (Tree Bool) | T3 (Tree Char) deriving Show
But could it possible to make new restricted type (for homogeneous trees) in more elegant (and without new tags like T1,T2,T3) way (perhaps with some advanced type extensions)?
2) Second question is about trees with heterogeneous values. I can do them with usual Haskell, i.e. I can do the new helping type, contained tagged heterogeneous values:
data HeteroValues = H1 Int | H2 Bool | H3 Char deriving Show
and then make tree with values of this type:
type TreeH = Tree HeteroValues
But could it possible to make new type (for heterogeneous trees) in more elegant (and without new tags like H1,H2,H3) way (perhaps with some advanced type extensions)?
I know about heterogeneous list, perhaps it is the same question?
For question #2, it's easy to construct a "restricted" heterogeneous type without explicit tags using a GADT and a type class:
{-# LANGUAGE GADTs #-}
data Thing where
T :: THING a => a -> Thing
class THING a
Now, declare THING instances for the the things you want to allow:
instance THING Int
instance THING Bool
instance THING Char
and you can create Things and lists (or trees) of Things:
> t1 = T 'a' -- Char is okay
> t2 = T "hello" -- but String is not
... type error ...
> tl = [T (42 :: Int), T True, T 'x']
> tt = Branch (Leaf (T 'x')) (Leaf (T False))
>
In terms of the type names in your question, you have:
type HeteroValues = Thing
type TreeH = Tree Thing
You can use the same type class with a new GADT for question #1:
data ThingTree where
TT :: THING a => Tree a -> ThingTree
and you have:
type TreeIWant = ThingTree
and you can do:
> tt1 = TT $ Branch (Leaf 'x') (Leaf 'y')
> tt2 = TT $ Branch (Leaf 'x') (Leaf False)
... type error ...
>
That's all well and good, until you try to use any of the values you've constructed. For example, if you wanted to write a function to extract a Bool from a possibly boolish Thing:
maybeBool :: Thing -> Maybe Bool
maybeBool (T x) = ...
you'd find yourself stuck here. Without a "tag" of some kind, there's no way of determining if x is a Bool, Int, or Char.
Actually, though, you do have an implicit tag available, namely the THING type class dictionary for x. So, you can write:
maybeBool :: Thing -> Maybe Bool
maybeBool (T x) = maybeBool' x
and then implement maybeBool' in your type class:
class THING a where
maybeBool' :: a -> Maybe Bool
instance THING Int where
maybeBool' _ = Nothing
instance THING Bool where
maybeBool' = Just
instance THING Char where
maybeBool' _ = Nothing
and you're golden!
Of course, if you'd used explicit tags:
data Thing = T_Int Int | T_Bool Bool | T_Char Char
then you could skip the type class and write:
maybeBool :: Thing -> Maybe Bool
maybeBool (T_Bool x) = Just x
maybeBool _ = Nothing
In the end, it turns out that the best Haskell representation of an algebraic sum of three types is just an algebraic sum of three types:
data Thing = T_Int Int | T_Bool Bool | T_Char Char
Trying to avoid the need for explicit tags will probably lead to a lot of inelegant boilerplate elsewhere.
Update: As #DanielWagner pointed out in a comment, you can use Data.Typeable in place of this boilerplate (effectively, have GHC generate a lot of boilerplate for you), so you can write:
import Data.Typeable
data Thing where
T :: THING a => a -> Thing
class Typeable a => THING a
instance THING Int
instance THING Bool
instance THING Char
maybeBool :: Thing -> Maybe Bool
maybeBool = cast
This perhaps seems "elegant" at first, but if you try this approach in real code, I think you'll regret losing the ability to pattern match on Thing constructors at usage sites (and so having to substitute chains of casts and/or comparisons of TypeReps).

Haskell - Using one data type kind in another

Haskell newbie; I want to be able to declare Val which can be either IntVal, StringVal FloatVal and a List which can be either StringList, IntList, FloatList, whose elements are (correspondingly): StringVal, IntVal and FloatVal.
My attempt so far:
data Val = IntVal Int
| FloatVal Float
| StringVal String deriving Show
data List = IntList [(IntVal Int)]
| FloatList [(FloatVal Float)]
| StringList [(StringVal String)] deriving Show
fails with the error:
Not in scope: type constructor or class ‘IntVal’
A data constructor of that name is in scope; did you mean DataKinds?
data List = IntList [(IntVal Int)]
... (similarly for StringVal, FloatVal..)
what is the right way to achieve this?
PS:
declaring List as data List = List [Val] ends up allowing Lists as follows:
l = [(IntVal 10),(StringVal "Hello")], which I do not want to allow.
I want each element of list to be a Value of same kind
There is a solution using GADTs. The problem is that IntVal etc are not actually types, they are just constructors (basically functions that also support pattern matching) for the single type Val. So once you have made a Val, the information about which kind of value it is is completely lost at the type level (that is, compile time).
The trick is to tag Val with the type it contains.
data Val a where
IntVal :: Int -> Val Int
FloatVal :: Float -> Val Float
StringVal :: String -> Val String
Then if you have a plain list [Val a] it will already be homogeneous. If you must:
data List = IntList [Val Int]
| FloatList [Val Float]
...
which is slightly different in that it "erases" the type of list, and it can distinguish between an empty list of ints and an empty list of floats, for example. You could also use the same GADT trick with List
data List a where
IntList :: [Val Int] -> List Int
FloatList :: [Val Float] -> List Float
...
but in that case I think a better design is probably the simpler
newtype List a = List [Val a]
The trade-offs between all these different designs really depends on what you are planning to do with them.

haskell type,new type or data for only an upper case char

If i want to make a String but holds only an uppercase character. I know that String is a [Char]. I have tried something like type a = ['A'..'Z'] but it did not work any help?
What you're wanting is dependent types, which Haskell doesn't have. Dependent types are those that depend on values, so using dependent types you could encode at the type level a vector with length 5 as
only5 :: Vector 5 a -> Vector 10 a
only5 vec = concatenate vec vec
Again, Haskell does not have dependent types, but languages like Agda, Coq and Idris do support them. Instead, you could just use a "smart constructor"
module MyModule
( Upper -- export type only, not constructor
, mkUpper -- export the smart constructor
) where
import Data.Char (isUpper)
newtype Upper = Upper String deriving (Eq, Show, Read, Ord)
mkUpper :: String -> Maybe Upper
mkUpper s = if all isUpper s then Just (Upper s) else Nothing
Here the constructor Upper is not exported, just the type, and then users of this module have to use the mkUpper function that safely rejects non-uppercase strings.
For clarification, and to show how awesome dependent types can be, consider the mysterious concatenate function from above. If I were to define this with dependent types, it would actually look something like
concatenate :: Vector n a -> Vector m a -> Vector (n + m) a
concatenate v1 v2 = undefined
Wait, what's arithmetic doing in a type signature? It's actually performing type-system level computations on the values that this type is dependent on. This removes a lot of potential boilerplate in Haskell, and it makes guarantees at compilation time that, e.g., arrays can't have negative length.
Most desires for dependent types can be filled either using smart constructors (see bheklilr's answer), generating Haskell from an external tool (Coq, Isabelle, Inch, etc), or using an exact representation. You probably want the first solution.
To exactly represent just the capitals then you could write a data type that includes a constructor for each letter and conversion to/from strings:
data Capital = CA | CB | CC | CD | CE | CF | CG | CH | CI | CJ | CK | CL | CM | CN | CO | CP | CQ | CR | CS | CT | CU | CV | CW | CX | CY | CZ deriving (Eq, Ord, Enum)
toString :: [Capital] -> String
toString = map (toEnum . (+ (fromEnum 'A')) . fromEnum)
You can even go a step further and allow conversion from string literals, "Anything in quotes", to a type [Capitals] by using the OverloadedStrings extension. Just add to the top of your file {-# LANGUAGE OverloadedStrings, FlexibleInstances #-}, be sure to import Data.String and write the instance:
type Capitals = [Capital]
instance IsString Capitals where
fromString = map (toEnum . (subtract (fromEnum 'A')) . fromEnum) . filter (\x -> 'A' <= x && x <= 'Z')
After that, you can type capitals all you want!
*Main> toString ("jfoeaFJOEW" :: Capitals)
"FJOEW"
*Main>
bheklilr is correct but perhaps for your purposes the following could be OK:
import Data.Char(toUpper)
newtype UpperChar = UpperChar Char
deriving (Show)
upperchar :: Char -> UpperChar
upperchar = UpperChar. toUpper
You can alternatively make UpperChar an alias of Char (use type instead of newtype) which would allow you to forms lists of both Char and UpperChar. The problem with an alias, however, is that you could feed a Char into a function expecting an UpperChar...
One way to do something similar which will work well for the Latin script of your choice but not so well as a fully general solution is to use a custom type to represent upper case letters. Something like this should do the trick:
data UpperChar = A|B|C|D| (fill in the rest) | Y | Z deriving (Enum, Eq, Ord, Show)
newtype UpperString = UpperString [UpperChar]
instance Show UpperString
show (UpperString s) = map show s
The members of this type are not Haskell Strings, but you can convert between them as needed.

haskell - types - functions - trees

For haskell practice I want to implement a game where students/pupils should learn some algebra playfully.
As basic datatype I want to use a tree:
with nodes that have labels and algebraic operators stored.
with leaves that have labels and variables (type String) or numbers
Now I want to define something like
data Tree = Leaf {l :: Label, val :: Expression}
| Node {l :: Label, f :: Fun, lBranch :: Tree, rBranch :: Tree}
data Fun = "one of [(+),(*),(-),(/),(^)]"
-- type Fun = Int -> Int
would work
Next things I think about is to make a 'equivalence' of trees - as multiplication/addition is commutative and one can simplify additions to multiplication etc. the whole bunch of algebraic operations.
I also have to search through the tree - by label I think is best, is this a good approach.
Any ideas what tags/phrases to look for and how to solve the "data Fun".
To expand a bit on Edward Z. Yang's answer:
The simplest way to define your operators here is probably as a data type, along with the types for atomic values in leaf nodes and the expression tree as a whole:
data Fun = Add | Mul | Sub | Div | Exp deriving (Eq, Ord, Show)
data Val a = Lit a | Var String deriving (Eq, Ord, Show)
data ExprTree a = Node String Fun (ExprTree a) (ExprTree a)
| Leaf String (Val a)
deriving (Eq, Ord, Show)
You can then define ExprTree a as an instance of Num and whatnot:
instance (Num a) => Num (ExprTree a) where
(+) = Node "" Add
(*) = Node "" Mul
(-) = Node "" Sub
negate = Node "" Sub 0
fromInteger = Leaf "" . Lit
...which allows creating unlabelled expressions in a very natural way:
*Main> :t 2 + 2
2 + 2 :: (Num t) => t
*Main> 2 + 2 :: ExprTree Int
Node "" Add (Leaf "" (Lit 2)) (Leaf "" (Lit 2))
Also, note the deriving clauses above on the data definitions, particularly Ord; this tells the compiler to automatically create an ordering relation on values of that type. This lets you sort them consistently which means you can, for instance, define a canonical ordering on subexpressions so that when rearranging commutative operations you don't get stuck in a loop. Given some canonical reductions and subexpressions in canonical order, in most cases you'll then be able to use the automatic equality relation given by Eq to check for subexpression equivalence.
Note that labels will affect the ordering and equality here. If that's not desired, you'll need to write your own definitions for Eq and Ord, much like the one I gave for Num.
After that, you can write some traversal and reduction functions, to do things like apply operators, perform variable substitution, etc.
It looks like you want to construct a symbolic algebra system. There is a large and varied literature on the subject.
You don't want to represent operators as Int -> Int, because then you can't check what operation any given function implements and then implement peephole optimization for things like simplification, etc. So a simple enumerated data type would do the trick, and then write the function eval which actually evaluates your tree.

Resources