Haskell AST with recursive types - haskell

I'm currently trying to build a lambda calculus solver, and I'm having a slight problem with constructing the AST. A lambda calculus term is inductively defined as:
1) A variable
2) A lambda, a variable, a dot, and a lambda expression.
3) A bracket, a lambda expression, a lambda expression and a bracket.
What I would like to do (and at first tried) is this:
data Expr =
Variable
| Abstract Variable Expr
| Application Expr Expr
Now obviously this doesn't work, since Variable is not a type, and Abstract Variable Expr expects types. So my hacky solution to this is to have:
type Variable = String
data Expr =
Atomic Variable
| Abstract Variable Expr
| Application Expr Expr
Now this is really annoying since I don't like the Atomic Variable on its own, but Abstract taking a string rather than an expr. Is there any way I can make this more elegant, and do it like the first solution?

Your first solution is just an erroneous definition without meaning. Variable is not a type there, it's a nullary value constructor. You can't refer to Variable in a type definition much like you can't refer to any value, like True, False or 100.
The second solution is in fact the direct translation of something we could write in BNF:
var ::= <string>
term ::= λ <var>. <term> | <term> <term> | <var>
And thus there is nothing wrong with it.

What you exactly want is to have some type like
data Expr
= Atomic Variable
| Abstract Expr Expr
| Application Expr Expr
But constrain first Expr in Abstract constructor to be only Atomic. There is no straightforward way to do this in Haskell because value of some type can be created by any constructor of this type. So the only approach is to make some separate data type or type alias for existing type (like your Variable type alias) and move all common logic into it. Your solution with Variable seems very ok to me.
But. You can use some other advanced features in Haskell to achieve you goal in different way. You can be inspired by glambda package which uses GADT to create typed lambda calculus. Also see this answer: https://stackoverflow.com/a/39931015/2900502
I can come up with next solution to achieve you minimal goals (if you only want to constrain first argument of Abstract):
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
data AtomicT
data AbstractT
data ApplicationT
data Expr :: * -> * where
Atomic :: String -> Expr AtomicT
Abstract :: Expr AtomicT -> Expr a -> Expr AbstractT
Application :: Expr a -> Expr b -> Expr ApplicationT
And next example works fine:
ex1 :: Expr AbstractT
ex1 = Abstract (Atomic "x") (Atomic "x")
But this example won't compile because of type mismatch:
ex2 :: Expr AbstractT
ex2 = Abstract ex1 ex1

Related

Type design for the AST of my language remembering token locations

I wrote a parser and evaluator for a simple programming language. Here is a simplified version of the types for the AST:
data Value = IntV Int | FloatV Float | BoolV Bool
data Expr = IfE Value [Expr] | VarDefE String Value
type Program = [Expr]
I want error messages to tell the line and column of the source code in which the error occured. For example, if the value in an If expression is not a boolean, I want the evaluator to show an error saying "expected boolean at line x, column y", with x and y referring to the location of the value.
So, what I need to do is redefine the previous types so that they can store the relevant locations of different things. One option would be to add a location to each constructor for expressions, like so:
type Location = (Int, Int)
data Expr = IfE Value [Expr] Location | VarDef String Value Location
This clearly isn't optimal, because I have to add those Location fields to every possible expression, and if for example a value contained other values, I would need to add locations to that value too:
{-
this would turn into FunctionCall String [Value] [Location],
with one location for each value in the function call
-}
data Value = ... | FunctionCall String [Value]
I came up with another solution, which allows me to add locations to everything:
data Located a = Located Location a
type LocatedExpr = Located Expr
type LocatedValue = Located Value
data Value = IntV Int | FloatV Float | BoolV Bool | FunctionCall String [LocatedValue]
data Expr = IfE LocatedValue [LocatedExpr] | VarDef String LocatedValue
data Program = [LocatedExpr]
However I don't like this that much. First of all, it clutters the definition of the evaluator and pattern matching has an extra layer every time. Also, I don't think saying that a function call takes located values as arguments is quite right. Function calls should take values as arguments, and locations should be metadata that doesn't interfere with the evaluator.
I need help redefining my types so that the solution is as clean as possible. Maybe there is a language extension or a design pattern I don't know about that could be helpful.
There are many ways to annotate an AST! This is half of what’s known as the AST typing problem, the other half being how you manage an AST that changes over the course of compilation. The problem isn’t exactly “solved”: all of the solutions have tradeoffs, and which one to pick depends on your expected use cases. I’ll go over a few that you might like to investigate at the end.
Whichever method you choose for organising the actual data types, if it makes pattern-matching ugly or unwieldy, the natural solution is PatternSynonyms.
Considering your first example:
{-# Language PatternSynonyms #-}
type Location = (Int, Int)
data Expr
= LocatedIf Value [Expr] Location
| LocatedVarDef String Value Location
-- Unidirectional pattern synonyms which ignore the location:
pattern If :: Value -> [Expr] -> Expr
pattern If val exprs <- LocatedIf val exprs _loc
pattern VarDef :: String -> Value -> Expr
pattern VarDef name expr <- LocatedVarDef name expr _loc
-- Inform GHC that matching ‘If’ and ‘VarDef’ is just as good
-- as matching ‘LocatedIf’ and ‘LocatedVarDef’.
{-# Complete If, VarDef #-}
This may be sufficiently tidy for your purposes already. But here are a few more tips that I find helpful.
Put annotations first: when adding an annotation type to an AST directly, I often prefer to place it as the first parameter of each constructor, so that it can be conveniently partially applied.
data LocatedExpr
= LocatedIf Location Value [Expr]
| LocatedVarDef Location String Value
If the annotation is a location, then this also makes it more convenient to obtain when writing certain kinds of parsers, along the lines of AnnotatedIf <$> (getSourceLocation <* ifKeyword) <*> value <*> many expr in a parser combinator library.
Parameterise your annotations: I often make the annotation type into a type parameter, so that GHC can derive some useful classes for me:
{-# Language
DeriveFoldable,
DeriveFunctor,
DeriveTraversable #-}
data AnnotatedExpr a
= AnnotatedIf a Value [Expr]
| AnnotatedVarDef a String Value
deriving (Functor, Foldable, Traversable)
type LocatedExpr = AnnotatedExpr Location
-- Get the annotation of an expression.
-- (Total as long as every constructor is annotated.)
exprAnnotation :: AnnotatedExpr a -> a
exprAnnotation = head
-- Update annotations purely.
mapAnnotations
:: (a -> b)
-> AnnotatedExpr a -> AnnotatedExpr b
mapAnnotations = fmap
-- traverse, foldMap, &c.
If you want “doesn’t interfere”, use polymorphism: you can enforce that the evaluator can’t inspect the annotation type by being polymorphic over it. Pattern synonyms still let you match on these expressions conveniently:
pattern If :: Value -> [AnnotatedExpr a] -> AnnotatedExpr a
pattern If val exprs <- AnnotatedIf _anno val exprs
-- …
eval :: AnnotatedExpr a -> Value
eval expr = case expr of
If val exprs -> -- …
VarDef name expr -> -- …
Unannotated terms aren’t your enemy: a term without source locations is no good for error reporting, but I think it’s still a good idea to make the pattern synonyms bidirectional for the convenience of constructing unannotated terms with a unit () annotation. (Or something equivalent, if you use e.g. Maybe Location as the annotation type.)
The reason is that this is quite convenient for writing unit tests, where you want to check the output, but want to use Eq instead of pattern matching, and don’t want to have to compare all the source locations in tests that aren’t concerned with them. Using the derived classes, void :: (Functor f) => f a -> f () strips out all the annotations on an AST.
import Control.Monad (void)
type BareExpr = AnnotatedExpr ()
-- One way to define bidirectional synonyms, so e.g.
-- ‘If’ can be used as either a pattern or a constructor.
pattern If :: Value -> [BareExpr] -> BareExpr
pattern If val exprs = AnnotatedIf () val exprs
-- …
stripAnnotations :: AnnotatedExpr a -> BareExpr
stripAnnotations = void
Equivalently, you could use GADTs / ExistentialQuantification to say data AnyExpr where { AnyExpr :: AnnotatedExpr a -> AnyExpr } / data AnyExpr = forall a. AnyExpr (AnnotatedExpr a); that way, the annotations have exactly as much information as (), but you don’t need to fmap over the entire tree with void in order to strip it, just apply the AnyExpr constructor to hide the type.
Finally, here are some brief introductions to a few AST typing solutions.
Annotate each AST node with a tag (e.g. a unique ID), then store all metadata like source locations, types, and whatever else, separately from the AST:
import Data.IntMap (IntMap)
-- More sophisticated/stronglier-typed tags are possible.
newtype Tag = Tag Int
newtype TagMap a = TagMap (IntMap a)
data Expr
= If !Tag Value [Expr]
| VarDef !Tag String Expr
type Span = (Location, Location)
type SourceMap = TagMap Span
type CommentMap = TagMap (Span, String)
parse
:: String -- Input
-> Either ParseError
( Expr -- Parsed expression
, SourceMap -- Source locations of tags
, CommentMap -- Sideband for comments
-- …
)
The advantage is that you can very easily mix in arbitrary new types of annotations anywhere, without affecting the AST itself, and avoid rewriting the AST just to change annotations. You can think of the tree and annotation tables as a kind of database, where the tags are the “foreign keys” relating them. A downside is that you must be careful to maintain these tags when you do rewrite the AST.
I don’t know if this approach has an established name; I think of it as just “tagging” or a “tagged AST”.
recursion-schemes and/or Data Types à la CartePDF: separate out the “recursive” part of an annotated expression tree from the “annotation” part, and use Fix to tie them back together, with Compose (or Cofree) to add annotations in the middle.
data ExprF e
= IfF Value [e]
| VarDefF String e
-- …
deriving (Foldable, Functor, Traversable, …)
-- Unannotated: Expr ~ ExprF (ExprF (ExprF (…)))
type Expr = Fix ExprF
-- With a location at each recursive step:
--
-- LocatedExpr ~ Located (ExprF (Located (ExprF (…))))
type LocatedExpr = Fix (Compose Located ExprF)
data Located a = Located Location a
deriving (Foldable, Functor, Traversable, …)
-- or: type Located = (,) Location
A distinct advantage is that you get a bunch of nice traversal stuff like cata for free-ish, so you can avoid having to write manual traversals over your AST over and over. A downside is that it adds some pattern clutter to clean up, as does the “à la carte” approach, but they do offer a lot of flexibility.
Trees That GrowPDF is overkill for just source locations, but in a serious compiler it’s quite helpful. If you expect to have more than one annotation type (such as inferred types or other analysis results) or an AST that changes over time, then you add a type parameter for the “compilation phase” (parsed, renamed, typechecked, desugared, &c.) and select field types or enable & disable constructors based on that index.
A really unfortunate downside of this is that you often have to rewrite the tree even in places nothing has changed, because everything depends on the “phase”. An alternative that I use is to add one type parameter for each type of phase or annotation that can vary independently, e.g. data Expr annotation termVarName typeVarName, and abstract over that with type and pattern synonyms. This lets you update indices independently and still use classes like Functor and Bitraversable.

Writing an interpreter for an imperative language in Haskell

I am trying to build an interpreter for a C-like language in Haskell. I have so far written and combined small monadic parsers following this paper, hence so far I can generate an AST representation of a program. I defined the abstract syntax as follows:
data LangType = TypeReal | TypeInt | TypeBool | TypeString deriving (Show)
type Id = String
data AddOp = Plus | Minus | Or deriving (Show)
data RelOp = LT | GT | LTE | GTE | NEq | Eq deriving (Show)
data MultOp = Mult | Div | And deriving (Show)
data UnOp = UnMinus | UnNot deriving (Show)
data BinOp = Rel RelOp | Mul MultOp | Add AddOp deriving (Show)
data AST = Program [Statement] deriving (Show)
data Block = StatsBlock [Statement] deriving (Show)
data Statement = VariableDecl Id LangType Expression
| Assignment Id Expression
| PrintStatement Expression
| IfStatement Expression Block Block
| WhileStatement Expression Block
| ReturnStatement Expression
| FunctionDecl Id LangType FormalParams Block
| BlockStatement Block
deriving (Show)
data Expression = RealLiteral Double
| IntLiteral Int
| BoolLiteral Bool
| StringLiteral String
| Unary UnOp Expression
| Binary BinOp Expression Expression
| FuncCall Id [Expression]
| Var Id
deriving (Show)
data FormalParams = IdentifierType [(Id, LangType)] deriving (Show)
I have yet to type-check my AST and build the interpreter to evaluate expressions and execute statements. My questions are the following:
Does the abstract syntax make sense/can it be improved? In particular, I've been running into a recurring problem. In the EBNF of this language I'm trying to interpret, a WhileStatement consists of an Expression (which I have no problem with) and a Block, which in the EBNF happens to be a Statement just like WhileStatement, and so I cannot refer to Block from my WhileStatement. I've worked around this by defining a separate data type Block (as is shown in the above code), but am not sure if this is the best way. I'm finding defining data types quite confusing.
Since I have to type-check my AST and evaluate/execute, do I implement these separately or can I define some function which does them both at the same time?
Any general tips on how I should go about type-checking and interpreting the language would also be greatly appreciated. Since the language has variable and function declarations, I am thinking of implementing some sort of symbol table, although again I am struggling with defining the type for this. So far I've tried
import qualified Data.Map as M
data Value = RealLit Double | IntLit Int | BoolLit Bool | StringLit String | Func [FormalParams] String
deriving (Show)
type TermEnv = M.Map String Value
but I'm unsure whether I should be using my LangType from before.
Addressing your question in the comments about how to proceed with type checking and evaluation.
If you don't have to do inference or polymorphism, type checking is pretty simple. Also type checking and evaluation mirror each other pretty closely in these conditions.
Begin by defining a monad with the features you need. For a type checker, you will need
A type environment, i.e. a Reader(Map Id LangType) component, to keep track of the types of local variables.
An error ability, e.g. ExceptString.
So you could define a monad like
type TypeEnv = Map.Map Id LangType
type TC = ReaderT TypeEnv (Except String)
And then your typechecker function would look like:
typeCheck :: AST -> TC ()
(We return () because there is nothing interesting to be gained from the typechecking process besides knowing whether the program passed.)
This will be largely structurally inductive, e.g.
typeCheck (Program stmt) = -- typecheckStmt each statement*
typeCheckStmt :: Statement -> TC ()
typeCheckStmt (VariableDecl v type defn) = ...
typeCheckStmt (Assignment v exp) = do
Just t <- asks (Map.lookup v)
t' <- typeCheckExp exp
when (t /= t') $ throwError "Types do not match"
...
-- Return the type of a composite expression to use elsewhere
typeCheckExp :: Expression -> TC LangType
...
There will be a bit of finesse required to make sure that variable declarations in a list of statements can be seen by later statements in the same list. I will leave that as a puzzle. (Hint: see the local function to provide an updated environment within a scope.)
Evaluation is a similar story. You're correct that you need a type of run-time values. Without some cleverness that you are probably not ready for (and is of questionable utility even if you were) there is not really a way to use LangType in Value, so you're on the right track.
You will need a monad that supports keeping track of the values of variables and the ability to do whatever else your language needs. To start I recommend
type Eval = StateT (Map Id Value) IO
and proceed structurally as before. There will again be some finesse required when handling variable scopes and shadowing, and you may need to change the environment type or mess with your Value type a bit to accommodate these subtleties, but thinking through these problems is important. Start simple, don't try to implement typechecking and evaluation for your whole language at once.

Is there a better way to add attribute field into AST in Haskell?

At first, I have a original AST definition like this:
data Expr = LitI Int | LitB Bool | Add Expr Expr
And I want to generalize it so that each AST node can contains some extra attributes:
data Expr a = LitI Int a | LitB Bool a | Add (Expr a) (Expr a) a
In this way, we can easily attach attribute into each node of the AST:
type ExprWithType = Expr TypeRep
type ExprWithSize = Expr Int
But this solution makes it hard to visit the attribute field, we must use pattern matching and process it case by case:
attribute :: Expr a -> a
attribute e = case e of
LitI _ a -> a
LitB _ a -> a
Add _ _ a -> a
We can image that if we can define our AST via a product type of the original AST and the type variable indicating attribute:
type ExprWithType = (Expr, TypeRep)
type ExprWithSize = (Expr, Int)
Then we can simplify the attribute visiting function like this:
attribute = snd
But we know that, the attribute from the outmost product type will not recursively appears in the subtrees.
So, is there a better solution for this problem?
Generally speaking, When we want to extract common field of different cases of a recursive sum type, we met this problem.
You could "lift" the type of the Expr for example like:
data Expr e = LitI Int | LitB Bool | Add e e
Now we can define a data type like:
data ExprAttr a = ExprAttr {
expression :: Expr (ExprAttr a),
attribute :: a
}
So here the ExprAttr has two parameters, the expression, which is thus an Expression that has ExprAttr as in the tree, and attribute which is an a.
You can thus process ExprAttrs, which is an AST of ExprAttrs. If you want to use a "simple" AST, you can define a type like:
newtype SimExpr = SimExpr (Expr SimExpr)
You might want to take a look at Cofree where f will be your recursive data type after abstracting the concept of recursion as an f-algebra and a will be the type of your annotation.
Nate Faubion gave a very accesible talk about this and similar approaches and you can watch it here: https://www.youtube.com/watch?v=eKkxmVFcd74

Omitting data type constructors

I'm trying to implement the following in Haskell:
0,1,2,...:N
x,y,z,...:V
+,*,-,/,...:F
F alias for Expr -> Expr -> Expr
Expr := N|V|F Expr Expr
My question is first:
Is the grammar flawed at type level? Does it make sense? All terms look like they'd type check (allowing for 0,1,... to be both Expr and N subtype, and x,y,... to be both Expr and V subtype).
And secondarily, what's the closest Haskell implementation? My current Haskell implementation is:
data F = +|-|*|...
data Expr = N|V|MakeExpr F Expr Expr
Any suggestions?
EDIT -
The key difference between the grammar and implementation is that type constructor is implicit /omitted in the grammar. Why are type constructors compulsory in Haskell?
The key difference between the grammar and implementation is that type constructor is implicit /omitted in the grammar. Why are type constructors compulsory in Haskell?
.
Is the grammar flawed at type level? Does it make sense? All terms look like they'd type check (allowing for 0,1,... to be both Expr and N subtype, and x,y,... to be both Expr and V subtype
The reason data constructors1 are compulsory in Haskell is specifically to ensure that you can't have x, y, .. be both Expr and V subtypes.
So your grammar looks like a reasonable model for how you want your language terms to work. But it doesn't make sense as a direct design for how you want to represent your language terms as Haskell data types.
Basically, Haskell deliberately does not have subtypes. It ensures that when you create a new type (with newtype or data) that all of the values of the new type are distinct from the values of all other existing types (and all types that will be created in future). It does this by having values of user-defined types always appear inside constructors (and making it impossible to "reuse" constructors; you always make new ones whenever you make a new type).
The way Haskell's type system works depends on this lack of subtyping. You could design a language that allowed subtypes (see Scala, perhaps). But it just fundamentally wouldn't be Haskell.
But what you can do instead is define something like:
data Expr
= ExprN N
| ExprV V
| ExprF Expr Expr
You still can't have a N value and just use it as an Expr. But you can just apply ExprN to it, and then you have an Expr. And it's really no more burden than if Haskell allowed you to use some n of type N as an Expr as well, but only required you to add a type annotation clarifying that that's what you meant; you just have to say ExprN n instead of n :: Expr.
Similarly when you have an Expr and you want to apply a function on N to it, the case statement to extract the N from the ExprN constructor (if it's there) isn't really any more code than you'd have to write to check if your Expr was actually an N.
1 "Type constructor" is a specific term in Haskell, which isn't what we're talking about here. I'm pretty sure what you meant by that was "the constructors for a type", but to be pedantic you accidentally referred to a different thing by using that term.
To clear it up, when you declare a type like data Maybe a = Nothing | Just a, Nothing and Just are new data constructors ("constructor" on its on is extremely likely to mean a data constructor) and Maybe is a new type constructor.

Could not deduce (b ~ a)

So I have the following code, I'm trying to write an abstract syntax tree for an interpreter, & I prefer not to jam everything in the same data type, so I was going to write a typeclass that had the basic behaviour (in this case AST).
{-# LANGUAGE ExistentialQuantification #-}
import qualified Data.Map as M
-- ...
data Expr = forall a. AST a => Expr a
type Env = [M.Map String Expr]
class AST a where
reduce :: AST b => a -> Env -> b
-- when i remove the line below, it compiles fine
reduce ast _ = ast
-- ...
When I remove the default implementation of reduce in the typeclass AST it compiles fine, but when ever I provide an implementation that returns it's self it complains. I get the following compiler error
src/Data/AbstractSyntaxTree.hs:13:18:
Could not deduce (b ~ a)
from the context (AST a)
bound by the class declaration for `AST'
at src/Data/AbstractSyntaxTree.hs:(11,1)-(13,20)
or from (AST b)
bound by the type signature for reduce :: AST b => a -> Env -> b
at src/Data/AbstractSyntaxTree.hs:12:13-36
`b' is a rigid type variable bound by
the type signature for reduce :: AST b => a -> Env -> b
at src/Data/AbstractSyntaxTree.hs:12:13
`a' is a rigid type variable bound by
the class declaration for `AST'
at src/Data/AbstractSyntaxTree.hs:11:11
In the expression: ast
In an equation for `reduce': reduce ast _ = ast
The behaviour of AST's reduce will evaluate an AST, and occasionally return a different type of AST, and sometimes the same type of AST.
Edit: Regarding data Expr = forall a. AST a => Expr a & GADTs
I originally went with data Expr = forall a. AST a => Expr a because I wanted to represent types like this.
(+ 2 true) -> Compound [Expr (Ref "+"), Expr 2, Expr true]
(+ a 2) -> Compound [Expr (Ref "+"), Expr (Ref "a"), Expr 2]
(eval (+ 2 2)) -> Compound [Expr (Ref "eval"),
Compound [
Expr (Ref "+"),
Expr 2,
Expr 2]]
((lambda (a b) (+ a b)) 2 2) -> Compound [Expr SomeLambdaAST, Expr 2, Expr 2]
Since I'm generating ASTs from text I feel it would be a burden to represent a strictly typed ASTs in a GADT, although I do see where they could be useful in case like DSLs in Haskell.
But since I'm generating the AST from text (which could contain some of the examples above), it might be a bit hard to predict what AST I'll end up with. I don't want to start juggling between Eithers & Maybes. That is what I ended up doing last time & it was a mess, & I gave up trying to attempt this in Haskell.
But again I'm not the most experienced Haskell programmer so maybe I'm looking at this the wrong way, maybe I can implement an AST with so more rigours typing, so I'll have a look and see if I can come up with GADTs, but I have my doubts & I have a feeling that it might end the way it did last time.
Ultimately I'm just trying to learn Haskell at the moment with a fun finish able project, so I don't mind if my first Haskell project isn't really idiomatic Haskell. Getting something working is a higher priority just so I can make my way around the language and have something to show for it.
Update:
I've taken #cdk's & #bheklilr advice and ditched the existential type, although I've gone with a much simpler type, as opposed to utilising GADTs (also suggested by #cdk's & #bheklilr). It could possibly be a stronger type but again I'm just trying to get familiar with Haskell, so I gave up up after a few hours & went with a simple data type like so :P
import qualified Data.Map as M
type Environment = [M.Map String AST]
data AST
= Compound [AST]
| FNum Double
| Func Function
| Err String
| Ref String
data Function
= NativeFn ([AST] -> AST)
| LangFn [String] AST
-- previously called reduce
eval :: Environment -> AST -> AST
eval env ast = case ast of
Ref ref -> case (lookup ref env ) of
Just ast -> ast
Nothing -> Err ("Not in scope `" ++ ref ++ "'")
Compound elements -> case elements of
[] -> Err "You tried to invoke `()'"
function : args -> case (eval env function) of
Func fn -> invoke env fn args
other -> Err $ "Cannot invoke " ++ (show other)
_ -> ast
-- invoke & lookup are defined else where
Although I will still probably look at GADTs as they seem to be pretty interesting & have lead me to some interesting reading material regarding implementing abstract syntax trees in haskell.
What part of the error message are you having difficulties understanding? I think it's quite clear.
The type of reduce is
reduce :: AST b => a -> Env -> b
The first argument has type a and GHC expects reduce to return something of type b, which may be entirely different from a. GHC is correct to complain that you've tried to return a value of a when it expects b.
The "existential quantification with type class" is (as noted by bheklilr) an anti-pattern. A better approach would be to create an Algebraic Data Type for AST:
data AST a
now reduce becomes a simple function:
reduce :: Env -> AST a -> AST b
if you want reduce to be able to return a different type of AST, you could use Either
reduce :: Env -> AST a -> Either (AST a) (AST b)
but I don't think this is what you really want. My advice is to take a look at the GADT style of creating ASTs and re-evaluate your approach.
You are interpreting this type signature incorrectly (in a way that is common to OO programmers):
reduce :: AST b => a -> Env -> b
This does not mean that reduce can choose any type it likes (that is a member of AST) and return a value of that type. If it did, your implementation would be valid. Rather, it means that for any type b the caller likes (that is a member of AST), reduce must be able to return a value in that type. b could well be the same as a sometimes, but it's the caller's choice, not the choice of reduce.
If your implementation returns a value of type a, then this can only be true if b is always equal to a, which is what the compiler is on about when it reports failing to prove that b ~ a.
Haskell does not have subtypes. Type variables are not supertypes of all the concrete types that could instantiate them, as you might be used to using Object or abstract interface types in OO languages. Rather type variables are parameters; any implementation which claims to have a parametric type must work regardless of what types are chosen for the parameters.
If you want to use a design where reduce can return a value in whatever type of AST it feels like (rather than whatever type of AST is asked of it), then you need to use your Expr box again, since Expr is not parameterized by the type of AST it contains, but can still contain any AST:
reduce :: a -> Env -> Expr
reduce ast _ = Expr ast
Now reduce can work regardless of the types chosen for its type parameters, since there's only a. Consumers of the returned Expr will have no way of constraining the type inside the Expr, so they'll have to be written to work regardless of what that type is.
Your default implementation doesn't compile, because it has the wrong definition.
reduce :: AST b => a -> b -> Env -> b
reduce ast _ = ast
Now ast has the type a and reduce function returns type b but according to your implementation you return ast which is of type a but the compiler expects b.
Even something like this will work:
reduce :: AST b => a -> b -> Env -> b
reduce _ ast _ = ast

Resources