to overload a data type or to use a similar one? - haskell

This is more of a question about a programming style and common practices. But I feel that it does not fit into the code review forum...
My program parses regular expressions and processes them. A regular expression can have the usual elements (Kleene closure, concatenation, etc) and it also can have references to other regular expressions by their names, like macros:
data Regex a = Epsilon
| Literal a
| Ranges [(a, a)]
| Ref String
| Then (Regex a) (Regex a)
| Or (Regex a) (Regex a)
| Star (Regex a)
After I process a regular expression and resolve all macro references, and convert Literal elements to Range elements (this is needed for my purposes), I end up with a type that cannot and must not have Ref and Literal, so in my functions that work with it I do something like:
foo (Literal _) = error "unexpected literal"
foo (Ref _) = error "unexpected reference"
foo (Epsilon) = ...
foo (Star x) = ...
...
This looks ugly to me because it does runtime checks instead of checks during compilation. Not a very haskell kind of approach.
So maybe I can introduce another data type which is very similar to the original one and use that?
data RegexSimple a = Epsilon2
| Ranges2 [(a, a)]
| Then2 (Regex a) (Regex a)
| Or2 (Regex a) (Regex a)
| Star2 (Regex a)
That would work, but here I have a lot of duplication and also the nice and descriptive names of constructors are taken now and I need to invent new ones...
What would the experts do here? I want to learn : )

I don't know what the rest of your code looks like, so this solution may require you to rethink certain aspects, but the most "haskell-ish" solution to this problem would probably be to use GADTs and phantom types. Together, they basically allow you to create arbitrary subtypes for more flexible type safety. You would redefine your types like so.
{-# LANGUAGE GADTs #-}
data Literal
data Ref
data Rangeable
data Regex t a where
Epsilon :: Regex Rangeable a
Literal :: a -> Regex Literal a
Ranges :: [(a, a)] -> Regex Rangeable a
Ref :: String -> Regex Ref a
Then :: Regex t' a -> Regex t' a -> Regex Rangeable a
Or :: Regex t' a -> Regex t' a -> Regex Rangeable a
Star :: Regex t' a -> Regex Rangeable
Then you can define
foo :: Regex Rangeable a
foo (Epsilon) = ...
foo s#(Star a) = ...
Now, statements like foo $ Literal 'c' will fail compile-time type-checks.

I'm not an expert but it's a problem I have also myself (even though it more with product type than sum type).
The obvious solution is to reuse RegexSimple in Regex so that
data Regex a = Ref a | Literal a | SimpleR (SimpleRegex a)
another way is to parametrize Regex with a functor
data Regex f a = Literal (f a) | Ref (f a) | Epsilon a ...
and use either Regex Id or Regex Void.
Another way is just to use Maybe
data Regex a = Literal (Maybe a) | Epsilon a ...
But this it less clean because you can't enforce a function to only accept simple regex.

Related

Type design for the AST of my language remembering token locations

I wrote a parser and evaluator for a simple programming language. Here is a simplified version of the types for the AST:
data Value = IntV Int | FloatV Float | BoolV Bool
data Expr = IfE Value [Expr] | VarDefE String Value
type Program = [Expr]
I want error messages to tell the line and column of the source code in which the error occured. For example, if the value in an If expression is not a boolean, I want the evaluator to show an error saying "expected boolean at line x, column y", with x and y referring to the location of the value.
So, what I need to do is redefine the previous types so that they can store the relevant locations of different things. One option would be to add a location to each constructor for expressions, like so:
type Location = (Int, Int)
data Expr = IfE Value [Expr] Location | VarDef String Value Location
This clearly isn't optimal, because I have to add those Location fields to every possible expression, and if for example a value contained other values, I would need to add locations to that value too:
{-
this would turn into FunctionCall String [Value] [Location],
with one location for each value in the function call
-}
data Value = ... | FunctionCall String [Value]
I came up with another solution, which allows me to add locations to everything:
data Located a = Located Location a
type LocatedExpr = Located Expr
type LocatedValue = Located Value
data Value = IntV Int | FloatV Float | BoolV Bool | FunctionCall String [LocatedValue]
data Expr = IfE LocatedValue [LocatedExpr] | VarDef String LocatedValue
data Program = [LocatedExpr]
However I don't like this that much. First of all, it clutters the definition of the evaluator and pattern matching has an extra layer every time. Also, I don't think saying that a function call takes located values as arguments is quite right. Function calls should take values as arguments, and locations should be metadata that doesn't interfere with the evaluator.
I need help redefining my types so that the solution is as clean as possible. Maybe there is a language extension or a design pattern I don't know about that could be helpful.
There are many ways to annotate an AST! This is half of what’s known as the AST typing problem, the other half being how you manage an AST that changes over the course of compilation. The problem isn’t exactly “solved”: all of the solutions have tradeoffs, and which one to pick depends on your expected use cases. I’ll go over a few that you might like to investigate at the end.
Whichever method you choose for organising the actual data types, if it makes pattern-matching ugly or unwieldy, the natural solution is PatternSynonyms.
Considering your first example:
{-# Language PatternSynonyms #-}
type Location = (Int, Int)
data Expr
= LocatedIf Value [Expr] Location
| LocatedVarDef String Value Location
-- Unidirectional pattern synonyms which ignore the location:
pattern If :: Value -> [Expr] -> Expr
pattern If val exprs <- LocatedIf val exprs _loc
pattern VarDef :: String -> Value -> Expr
pattern VarDef name expr <- LocatedVarDef name expr _loc
-- Inform GHC that matching ‘If’ and ‘VarDef’ is just as good
-- as matching ‘LocatedIf’ and ‘LocatedVarDef’.
{-# Complete If, VarDef #-}
This may be sufficiently tidy for your purposes already. But here are a few more tips that I find helpful.
Put annotations first: when adding an annotation type to an AST directly, I often prefer to place it as the first parameter of each constructor, so that it can be conveniently partially applied.
data LocatedExpr
= LocatedIf Location Value [Expr]
| LocatedVarDef Location String Value
If the annotation is a location, then this also makes it more convenient to obtain when writing certain kinds of parsers, along the lines of AnnotatedIf <$> (getSourceLocation <* ifKeyword) <*> value <*> many expr in a parser combinator library.
Parameterise your annotations: I often make the annotation type into a type parameter, so that GHC can derive some useful classes for me:
{-# Language
DeriveFoldable,
DeriveFunctor,
DeriveTraversable #-}
data AnnotatedExpr a
= AnnotatedIf a Value [Expr]
| AnnotatedVarDef a String Value
deriving (Functor, Foldable, Traversable)
type LocatedExpr = AnnotatedExpr Location
-- Get the annotation of an expression.
-- (Total as long as every constructor is annotated.)
exprAnnotation :: AnnotatedExpr a -> a
exprAnnotation = head
-- Update annotations purely.
mapAnnotations
:: (a -> b)
-> AnnotatedExpr a -> AnnotatedExpr b
mapAnnotations = fmap
-- traverse, foldMap, &c.
If you want “doesn’t interfere”, use polymorphism: you can enforce that the evaluator can’t inspect the annotation type by being polymorphic over it. Pattern synonyms still let you match on these expressions conveniently:
pattern If :: Value -> [AnnotatedExpr a] -> AnnotatedExpr a
pattern If val exprs <- AnnotatedIf _anno val exprs
-- …
eval :: AnnotatedExpr a -> Value
eval expr = case expr of
If val exprs -> -- …
VarDef name expr -> -- …
Unannotated terms aren’t your enemy: a term without source locations is no good for error reporting, but I think it’s still a good idea to make the pattern synonyms bidirectional for the convenience of constructing unannotated terms with a unit () annotation. (Or something equivalent, if you use e.g. Maybe Location as the annotation type.)
The reason is that this is quite convenient for writing unit tests, where you want to check the output, but want to use Eq instead of pattern matching, and don’t want to have to compare all the source locations in tests that aren’t concerned with them. Using the derived classes, void :: (Functor f) => f a -> f () strips out all the annotations on an AST.
import Control.Monad (void)
type BareExpr = AnnotatedExpr ()
-- One way to define bidirectional synonyms, so e.g.
-- ‘If’ can be used as either a pattern or a constructor.
pattern If :: Value -> [BareExpr] -> BareExpr
pattern If val exprs = AnnotatedIf () val exprs
-- …
stripAnnotations :: AnnotatedExpr a -> BareExpr
stripAnnotations = void
Equivalently, you could use GADTs / ExistentialQuantification to say data AnyExpr where { AnyExpr :: AnnotatedExpr a -> AnyExpr } / data AnyExpr = forall a. AnyExpr (AnnotatedExpr a); that way, the annotations have exactly as much information as (), but you don’t need to fmap over the entire tree with void in order to strip it, just apply the AnyExpr constructor to hide the type.
Finally, here are some brief introductions to a few AST typing solutions.
Annotate each AST node with a tag (e.g. a unique ID), then store all metadata like source locations, types, and whatever else, separately from the AST:
import Data.IntMap (IntMap)
-- More sophisticated/stronglier-typed tags are possible.
newtype Tag = Tag Int
newtype TagMap a = TagMap (IntMap a)
data Expr
= If !Tag Value [Expr]
| VarDef !Tag String Expr
type Span = (Location, Location)
type SourceMap = TagMap Span
type CommentMap = TagMap (Span, String)
parse
:: String -- Input
-> Either ParseError
( Expr -- Parsed expression
, SourceMap -- Source locations of tags
, CommentMap -- Sideband for comments
-- …
)
The advantage is that you can very easily mix in arbitrary new types of annotations anywhere, without affecting the AST itself, and avoid rewriting the AST just to change annotations. You can think of the tree and annotation tables as a kind of database, where the tags are the “foreign keys” relating them. A downside is that you must be careful to maintain these tags when you do rewrite the AST.
I don’t know if this approach has an established name; I think of it as just “tagging” or a “tagged AST”.
recursion-schemes and/or Data Types à la CartePDF: separate out the “recursive” part of an annotated expression tree from the “annotation” part, and use Fix to tie them back together, with Compose (or Cofree) to add annotations in the middle.
data ExprF e
= IfF Value [e]
| VarDefF String e
-- …
deriving (Foldable, Functor, Traversable, …)
-- Unannotated: Expr ~ ExprF (ExprF (ExprF (…)))
type Expr = Fix ExprF
-- With a location at each recursive step:
--
-- LocatedExpr ~ Located (ExprF (Located (ExprF (…))))
type LocatedExpr = Fix (Compose Located ExprF)
data Located a = Located Location a
deriving (Foldable, Functor, Traversable, …)
-- or: type Located = (,) Location
A distinct advantage is that you get a bunch of nice traversal stuff like cata for free-ish, so you can avoid having to write manual traversals over your AST over and over. A downside is that it adds some pattern clutter to clean up, as does the “à la carte” approach, but they do offer a lot of flexibility.
Trees That GrowPDF is overkill for just source locations, but in a serious compiler it’s quite helpful. If you expect to have more than one annotation type (such as inferred types or other analysis results) or an AST that changes over time, then you add a type parameter for the “compilation phase” (parsed, renamed, typechecked, desugared, &c.) and select field types or enable & disable constructors based on that index.
A really unfortunate downside of this is that you often have to rewrite the tree even in places nothing has changed, because everything depends on the “phase”. An alternative that I use is to add one type parameter for each type of phase or annotation that can vary independently, e.g. data Expr annotation termVarName typeVarName, and abstract over that with type and pattern synonyms. This lets you update indices independently and still use classes like Functor and Bitraversable.

Haskell code prints out a list for ints but not for chars

My code currently looks like this. It is supposed to show the possible first symbols in the regular expression definition given to us beforehand. I am supposed to print these out as a list. For example, if the answer is supposed to be [1,2], it will come out [1,2] but when the answer is supposed to be ['1','2'] it will come out "12" or when it is supposed to be ['a', 'b'] it will come out "ab". What am I doing wrong?
data RE a -- regular expressions over an alphabet defined by 'a'
= Empty -- empty regular expression
| Sym a -- match the given symbol
| RE a :+: RE a -- concatenation of two regular expressions
| RE a :|: RE a -- choice between two regular expressions
| Rep (RE a) -- zero or more repetitions of a regular expression
| Rep1 (RE a) -- one or more repetitions of a regular expression
deriving (Show)
firstMatches :: RE a -> [a]
firstMatches Empty = []
firstMatches (Sym a)= a:list
firstMatches(Rep(a))=firstMatches a
firstMatches(Rep1(a))=firstMatches a
firstMatches (Empty :+: b)= firstMatches b
firstMatches (a :+: _) = firstMatches a
firstMatches (a :|: b)= firstMatches a ++ firstMatches b
You're not doing anything wrong.
String is a type synonym for [Char], so if you try to print a [Char] it will print as a String. This is somewhat of a special case, and it can be a little weird.
Show is the typeclass used to print things as a string. The definition of Show is something like this:
class Show a where
showsPrec :: Int -> a -> ShowS
show :: a -> String
showList :: [a] -> ShowS
The showList function is optional. The documentation states:
The method showList is provided to allow the programmer to give a specialised way of showing lists of values. For example, this is used by the predefined Show instance of the Char type, where values of type String should be shown in double quotes, rather than between square brackets.
So if you define a new type and instantiate Show, you can optionally define a special way to show a list of your type, separate from the way it's normally shown and separate from the way lists are normally shown. Char takes advantage of this, in that a [Char] (or equivalently, a String), is shown with double-quotes instead of as a list of Char values.
I can't think of a way to get it to use the default show for a [Char]. I don't think there is one. A workaround might be to create a newtype wrapping Char with its own Show that uses the default showList implementation, but that doesn't seem appropriate here.
If this is homework, I'd expect the grader to know about this already, and I seriously doubt you'd get marked down for it, especially since the problem doesn't appear to be about show at all.

Factoring out recursion in a complex AST

For a side project I am working on I currently have to deal with an abstract syntax tree and transform it according to rules (the specifics are unimportant).
The AST itself is nontrivial, meaning it has subexpressions which are restricted to some types only. (e.g. the operator A must take an argument which is of type B only, not any Expr. A drastically simplified reduced version of my datatype looks like this:
data Expr = List [Expr]
| Strange Str
| Literal Lit
data Str = A Expr
| B Expr
| C Lit
| D String
| E [Expr]
data Lit = Int Int
| String String
My goal is to factor out the explicit recursion and rely on recursion schemes instead, as demonstrated in these two excellent blog posts, which provide very powerful general-purpose tools to operate on my AST. Applying the necessary factoring, we end up with:
data ExprF a = List [a]
| Strange (StrF a)
| Literal (LitF a)
data StrF a = A a
| B a
| C (LitF a)
| D String
| E [a]
data LitF a = Int Int
| String String
If I didn't mess up, type Expr = Fix ExprF should now be isomorphic to the previously defined Expr.
However, writing cata for these cases becomes rather tedious, as I have to pattern match B a :: StrF a inside of an Str :: ExprF a for cata to be well-typed. For the entire original AST this is unfeasible.
I stumbled upon fixing GADTs, which seems to me like it is a solution to my problem, however the user-unfriendly interface of the duplicated higher-order type classes etc. is quite the unneccessary boilerplate.
So, to sum up my questions:
Is rewriting the AST as a GADT the correct way to go about this?
If yes, how could I transform the example into a well-working version? On a second note, is there better support for higher kinded Functors in GHC now?
If you've gone through the effort of to separate out the recursion in your data type, then you can just derive Functor and you're done. You don't need any fancy features to get the recursion scheme. (As a side note, there's no reason to parameterize the Lit data type.)
The fold is:
newtype Fix f = In { out :: f (Fix f) }
gfold :: (Functor f) => (f a -> a) -> Fix f -> a
gfold alg = alg . fmap (gfold alg) . out
To specify the algebra (the alg parameter), you need to do a case analysis against ExprF, but the alternative would be to have the fold have a dozen or more parameters: one for each data constructor. That wouldn't really save you much typing and would be much harder to read. If you want (and this may require rank-2 types in general), you can package all those parameters up into a record and then you could use record update to update "pre-made" records that provide "default" behavior in various circumstances. There's an old paper Dealing with Large Bananas that takes an approach like this. What I'm suggesting, to be clear, is just wrapping the gfold function above with a function that takes a record, and passes in an algebra that will do the case analysis and call the appropriate field of the record for each case.
Of course, you could use GHC Generics or the various "generic/polytypic" programming libraries like Scrap Your Boilerplate instead of this. You are basically recreating what they do.

Parsec parsing in Haskell

I have 2 parsers:
nexpr::Parser (Expr Double)
sexpr::Parser (Expr String)
How do I build a parser that tries one and then the other if it doesn't work? I can't figure out what to return. There must be a clever way to do this.
Thanks.
EDIT:
Adding a bit more info...
I'm learning Haskel, so I started with :
data Expr a where
N::Double -> Expr Double
S::String -> Expr String
Add::Expr Double -> Expr Double -> Expr Double
Cat::Expr String -> Expr String -> Expr String
then I read about F-algebra (here) and so I changed it to:
data ExprF :: (* -> *) -> * -> * where
N::Double -> ExprF r Double
S::String -> ExprF r String
Add::r Double -> r Double -> ExprF r Double
Cat::r String -> r String -> ExprF r String
with
type Expr = HFix ExprF
so my parse to:
Parser (Expr Double)
is actually:
Parser (ExprF HFix Double)
Maybe I'm biting off more than I can chew...
As noted in the comments, you can have a parser like this
nOrSexpr :: Parser (Either (Expr Double) (Expr String))
nOrSexpr = (Left <$> nexpr) <|> (Right <$> sexpr)
However, I think the reason that you are having this difficulty is because you are not representing your parse tree as a single type, which is the more usual thing to do. Something like this:
data Expr =
ExprDouble Double
| ExprInt Int
| ExprString String
That way you can have parsers for each kind of expression that are all of type Parser Expr. This is the same as using Either but more flexible and maintainable. So you might have
doubleParser :: Parser Expr
doubleParser = ...
intParser :: Parser Expr
intParser = ...
stringParser :: Parser Expr
stringParser = ...
exprParser :: Parser Expr
exprParser = intParser <|> doubleParser <|> stringParser
Note that the order of the parsers does matter and use can use Parsec's try function if backtracking is needed.
So, for example, if you want to have a sum expression now, you can add to the data type
data Expr =
ExprDouble Double
| ExprInt Int
| ExprString String
| ExprSum Expr Expr
and make the parser
sumParser :: Parser Expr
sumParser = do
a <- exprParser
string " + "
b <- exprParser
return $ ExprSum a b
UPDATE
Well, I take my hat off to you diving straight into GADTs if you are just starting with Haskell. I have been reading through the paper you linked and noticed this immediately in the first paragraph:
The jury is still out on whether the additional type-safety provided by GADTs is worth the added inconvenience of working with them.
There are three points worth taking away here I think. The first is simply that I would have a go with the simpler way of doing things first, to get an idea of how it works and why you might want to add more type safety, before trying to more complicated type theoretical stuff. That comment may not help so feel free to ignore it!
Secondly, and more importantly, your representation...
data ExprF :: (* -> *) -> * -> * where
N :: Double -> ExprF r Double
S :: String -> ExprF r String
Add :: r Double -> r Double -> ExprF r Double
Cat :: r String -> r String -> ExprF r String
...is specifically designed to not allow ill formed type expressions. Contrasted with mine which can, eg ExprSum (ExprDouble 5.0) (ExprString "test"). So the question you really want to ask is what should actually happen when the parser attempts to parse something like "5.0 + \"test\""? Do you want it to just not parse, or do you want it to return a nice message saying that this expression is the wrong type? Compilers are usually designed in multiple stages for this reason. The first pass turns the input into an abstract syntax tree (AST), and further passes annotate this tree with type judgements. This annotated AST can then be transformed into the semantic representation that you really want it in.
So in your case I would recommend two stages. first, parse into a dumb representation like mine, that will give you the correct tree shape but allow ill-typed expressions. Like
data ExprAST =
ExprASTDouble Double
| ExprASTInt Int
| ExprASTString String
| ExprASTAdd Expr Expr
Then have another function that will typecheck the ExprAST. Something like
typecheck :: ExprAST -> Maybe (ExprF HFix a)
(You could also use Either and return either the typechecked GADT or an error string saying what the problem is.) The further problem here is that you don't know what a is statically. The other answer solves this by using type tags and an existential wrapper, which you might find to be the best way to go. I feel like it might be simpler to have a top level expression in your GADT that all expressions must live in, so an entire parse will always have the same type. In the end there is usually only one program type.
My third, and last, point is related to this
The jury is still out on whether the additional type-safety provided by GADTs is worth the added inconvenience of working with them.
The more type safety, generally the more work you have to do to get it. You mention you are new to Haskell, yet this adventure has taken us right to the edge of what it is capable of doing. The type of the parsed expression cannot depend only on the input string in a Haskell function, because it does not allow for dependant types. If you want to go down this path, I might suggest you have a look at a language called Idris. A great introduction to what it is capable of can be found in this video, in which he constructs a typesafe printf.
The problem described looks to be using Parsec to parse into a GADT representation, for which probably the easiest solution would be parse into a monotype representation and then have a (likely partial) type checking phase to produce the well-typed GADT, if it can. The monotype representation could be an existential wrapper over a GADT term, with a type-tag to reify the GADT index.
EDIT: a quick example
Let's define a type for type-tags and an existential wrapper:
data Type :: * -> * where
TDouble :: Type Double
TString :: Type String
data Judgement f = forall ix. Judgement (f ix) (Type ix)
With the example GADT given in the original post, we only have a problem with the outer-most production, which we need to parse to a monotype as we don't know statically which expression type we will get at runtime:
pExpr :: Parser (Judgement Expr)
pExpr = Judgement <$> pDblExpr <*> pure TDouble
<|> Judgement <$> pStrExpr <*> pure TString
We can write a type check phase to produce a GADT or fail, depending on whether the type assertion succeeds or not:
typecheck :: Judgement Expr -> Type ix -> Maybe (Expr ix)
typecheck (Judgement e TDouble) TDouble = Just e
typecheck (Judgement e TString) TString = Just e
typecheck _ _ = Nothing

Could not deduce (b ~ a)

So I have the following code, I'm trying to write an abstract syntax tree for an interpreter, & I prefer not to jam everything in the same data type, so I was going to write a typeclass that had the basic behaviour (in this case AST).
{-# LANGUAGE ExistentialQuantification #-}
import qualified Data.Map as M
-- ...
data Expr = forall a. AST a => Expr a
type Env = [M.Map String Expr]
class AST a where
reduce :: AST b => a -> Env -> b
-- when i remove the line below, it compiles fine
reduce ast _ = ast
-- ...
When I remove the default implementation of reduce in the typeclass AST it compiles fine, but when ever I provide an implementation that returns it's self it complains. I get the following compiler error
src/Data/AbstractSyntaxTree.hs:13:18:
Could not deduce (b ~ a)
from the context (AST a)
bound by the class declaration for `AST'
at src/Data/AbstractSyntaxTree.hs:(11,1)-(13,20)
or from (AST b)
bound by the type signature for reduce :: AST b => a -> Env -> b
at src/Data/AbstractSyntaxTree.hs:12:13-36
`b' is a rigid type variable bound by
the type signature for reduce :: AST b => a -> Env -> b
at src/Data/AbstractSyntaxTree.hs:12:13
`a' is a rigid type variable bound by
the class declaration for `AST'
at src/Data/AbstractSyntaxTree.hs:11:11
In the expression: ast
In an equation for `reduce': reduce ast _ = ast
The behaviour of AST's reduce will evaluate an AST, and occasionally return a different type of AST, and sometimes the same type of AST.
Edit: Regarding data Expr = forall a. AST a => Expr a & GADTs
I originally went with data Expr = forall a. AST a => Expr a because I wanted to represent types like this.
(+ 2 true) -> Compound [Expr (Ref "+"), Expr 2, Expr true]
(+ a 2) -> Compound [Expr (Ref "+"), Expr (Ref "a"), Expr 2]
(eval (+ 2 2)) -> Compound [Expr (Ref "eval"),
Compound [
Expr (Ref "+"),
Expr 2,
Expr 2]]
((lambda (a b) (+ a b)) 2 2) -> Compound [Expr SomeLambdaAST, Expr 2, Expr 2]
Since I'm generating ASTs from text I feel it would be a burden to represent a strictly typed ASTs in a GADT, although I do see where they could be useful in case like DSLs in Haskell.
But since I'm generating the AST from text (which could contain some of the examples above), it might be a bit hard to predict what AST I'll end up with. I don't want to start juggling between Eithers & Maybes. That is what I ended up doing last time & it was a mess, & I gave up trying to attempt this in Haskell.
But again I'm not the most experienced Haskell programmer so maybe I'm looking at this the wrong way, maybe I can implement an AST with so more rigours typing, so I'll have a look and see if I can come up with GADTs, but I have my doubts & I have a feeling that it might end the way it did last time.
Ultimately I'm just trying to learn Haskell at the moment with a fun finish able project, so I don't mind if my first Haskell project isn't really idiomatic Haskell. Getting something working is a higher priority just so I can make my way around the language and have something to show for it.
Update:
I've taken #cdk's & #bheklilr advice and ditched the existential type, although I've gone with a much simpler type, as opposed to utilising GADTs (also suggested by #cdk's & #bheklilr). It could possibly be a stronger type but again I'm just trying to get familiar with Haskell, so I gave up up after a few hours & went with a simple data type like so :P
import qualified Data.Map as M
type Environment = [M.Map String AST]
data AST
= Compound [AST]
| FNum Double
| Func Function
| Err String
| Ref String
data Function
= NativeFn ([AST] -> AST)
| LangFn [String] AST
-- previously called reduce
eval :: Environment -> AST -> AST
eval env ast = case ast of
Ref ref -> case (lookup ref env ) of
Just ast -> ast
Nothing -> Err ("Not in scope `" ++ ref ++ "'")
Compound elements -> case elements of
[] -> Err "You tried to invoke `()'"
function : args -> case (eval env function) of
Func fn -> invoke env fn args
other -> Err $ "Cannot invoke " ++ (show other)
_ -> ast
-- invoke & lookup are defined else where
Although I will still probably look at GADTs as they seem to be pretty interesting & have lead me to some interesting reading material regarding implementing abstract syntax trees in haskell.
What part of the error message are you having difficulties understanding? I think it's quite clear.
The type of reduce is
reduce :: AST b => a -> Env -> b
The first argument has type a and GHC expects reduce to return something of type b, which may be entirely different from a. GHC is correct to complain that you've tried to return a value of a when it expects b.
The "existential quantification with type class" is (as noted by bheklilr) an anti-pattern. A better approach would be to create an Algebraic Data Type for AST:
data AST a
now reduce becomes a simple function:
reduce :: Env -> AST a -> AST b
if you want reduce to be able to return a different type of AST, you could use Either
reduce :: Env -> AST a -> Either (AST a) (AST b)
but I don't think this is what you really want. My advice is to take a look at the GADT style of creating ASTs and re-evaluate your approach.
You are interpreting this type signature incorrectly (in a way that is common to OO programmers):
reduce :: AST b => a -> Env -> b
This does not mean that reduce can choose any type it likes (that is a member of AST) and return a value of that type. If it did, your implementation would be valid. Rather, it means that for any type b the caller likes (that is a member of AST), reduce must be able to return a value in that type. b could well be the same as a sometimes, but it's the caller's choice, not the choice of reduce.
If your implementation returns a value of type a, then this can only be true if b is always equal to a, which is what the compiler is on about when it reports failing to prove that b ~ a.
Haskell does not have subtypes. Type variables are not supertypes of all the concrete types that could instantiate them, as you might be used to using Object or abstract interface types in OO languages. Rather type variables are parameters; any implementation which claims to have a parametric type must work regardless of what types are chosen for the parameters.
If you want to use a design where reduce can return a value in whatever type of AST it feels like (rather than whatever type of AST is asked of it), then you need to use your Expr box again, since Expr is not parameterized by the type of AST it contains, but can still contain any AST:
reduce :: a -> Env -> Expr
reduce ast _ = Expr ast
Now reduce can work regardless of the types chosen for its type parameters, since there's only a. Consumers of the returned Expr will have no way of constraining the type inside the Expr, so they'll have to be written to work regardless of what that type is.
Your default implementation doesn't compile, because it has the wrong definition.
reduce :: AST b => a -> b -> Env -> b
reduce ast _ = ast
Now ast has the type a and reduce function returns type b but according to your implementation you return ast which is of type a but the compiler expects b.
Even something like this will work:
reduce :: AST b => a -> b -> Env -> b
reduce _ ast _ = ast

Resources