Writing an interpreter for an imperative language in Haskell - haskell

I am trying to build an interpreter for a C-like language in Haskell. I have so far written and combined small monadic parsers following this paper, hence so far I can generate an AST representation of a program. I defined the abstract syntax as follows:
data LangType = TypeReal | TypeInt | TypeBool | TypeString deriving (Show)
type Id = String
data AddOp = Plus | Minus | Or deriving (Show)
data RelOp = LT | GT | LTE | GTE | NEq | Eq deriving (Show)
data MultOp = Mult | Div | And deriving (Show)
data UnOp = UnMinus | UnNot deriving (Show)
data BinOp = Rel RelOp | Mul MultOp | Add AddOp deriving (Show)
data AST = Program [Statement] deriving (Show)
data Block = StatsBlock [Statement] deriving (Show)
data Statement = VariableDecl Id LangType Expression
| Assignment Id Expression
| PrintStatement Expression
| IfStatement Expression Block Block
| WhileStatement Expression Block
| ReturnStatement Expression
| FunctionDecl Id LangType FormalParams Block
| BlockStatement Block
deriving (Show)
data Expression = RealLiteral Double
| IntLiteral Int
| BoolLiteral Bool
| StringLiteral String
| Unary UnOp Expression
| Binary BinOp Expression Expression
| FuncCall Id [Expression]
| Var Id
deriving (Show)
data FormalParams = IdentifierType [(Id, LangType)] deriving (Show)
I have yet to type-check my AST and build the interpreter to evaluate expressions and execute statements. My questions are the following:
Does the abstract syntax make sense/can it be improved? In particular, I've been running into a recurring problem. In the EBNF of this language I'm trying to interpret, a WhileStatement consists of an Expression (which I have no problem with) and a Block, which in the EBNF happens to be a Statement just like WhileStatement, and so I cannot refer to Block from my WhileStatement. I've worked around this by defining a separate data type Block (as is shown in the above code), but am not sure if this is the best way. I'm finding defining data types quite confusing.
Since I have to type-check my AST and evaluate/execute, do I implement these separately or can I define some function which does them both at the same time?
Any general tips on how I should go about type-checking and interpreting the language would also be greatly appreciated. Since the language has variable and function declarations, I am thinking of implementing some sort of symbol table, although again I am struggling with defining the type for this. So far I've tried
import qualified Data.Map as M
data Value = RealLit Double | IntLit Int | BoolLit Bool | StringLit String | Func [FormalParams] String
deriving (Show)
type TermEnv = M.Map String Value
but I'm unsure whether I should be using my LangType from before.

Addressing your question in the comments about how to proceed with type checking and evaluation.
If you don't have to do inference or polymorphism, type checking is pretty simple. Also type checking and evaluation mirror each other pretty closely in these conditions.
Begin by defining a monad with the features you need. For a type checker, you will need
A type environment, i.e. a Reader(Map Id LangType) component, to keep track of the types of local variables.
An error ability, e.g. ExceptString.
So you could define a monad like
type TypeEnv = Map.Map Id LangType
type TC = ReaderT TypeEnv (Except String)
And then your typechecker function would look like:
typeCheck :: AST -> TC ()
(We return () because there is nothing interesting to be gained from the typechecking process besides knowing whether the program passed.)
This will be largely structurally inductive, e.g.
typeCheck (Program stmt) = -- typecheckStmt each statement*
typeCheckStmt :: Statement -> TC ()
typeCheckStmt (VariableDecl v type defn) = ...
typeCheckStmt (Assignment v exp) = do
Just t <- asks (Map.lookup v)
t' <- typeCheckExp exp
when (t /= t') $ throwError "Types do not match"
...
-- Return the type of a composite expression to use elsewhere
typeCheckExp :: Expression -> TC LangType
...
There will be a bit of finesse required to make sure that variable declarations in a list of statements can be seen by later statements in the same list. I will leave that as a puzzle. (Hint: see the local function to provide an updated environment within a scope.)
Evaluation is a similar story. You're correct that you need a type of run-time values. Without some cleverness that you are probably not ready for (and is of questionable utility even if you were) there is not really a way to use LangType in Value, so you're on the right track.
You will need a monad that supports keeping track of the values of variables and the ability to do whatever else your language needs. To start I recommend
type Eval = StateT (Map Id Value) IO
and proceed structurally as before. There will again be some finesse required when handling variable scopes and shadowing, and you may need to change the environment type or mess with your Value type a bit to accommodate these subtleties, but thinking through these problems is important. Start simple, don't try to implement typechecking and evaluation for your whole language at once.

Related

Type design for the AST of my language remembering token locations

I wrote a parser and evaluator for a simple programming language. Here is a simplified version of the types for the AST:
data Value = IntV Int | FloatV Float | BoolV Bool
data Expr = IfE Value [Expr] | VarDefE String Value
type Program = [Expr]
I want error messages to tell the line and column of the source code in which the error occured. For example, if the value in an If expression is not a boolean, I want the evaluator to show an error saying "expected boolean at line x, column y", with x and y referring to the location of the value.
So, what I need to do is redefine the previous types so that they can store the relevant locations of different things. One option would be to add a location to each constructor for expressions, like so:
type Location = (Int, Int)
data Expr = IfE Value [Expr] Location | VarDef String Value Location
This clearly isn't optimal, because I have to add those Location fields to every possible expression, and if for example a value contained other values, I would need to add locations to that value too:
{-
this would turn into FunctionCall String [Value] [Location],
with one location for each value in the function call
-}
data Value = ... | FunctionCall String [Value]
I came up with another solution, which allows me to add locations to everything:
data Located a = Located Location a
type LocatedExpr = Located Expr
type LocatedValue = Located Value
data Value = IntV Int | FloatV Float | BoolV Bool | FunctionCall String [LocatedValue]
data Expr = IfE LocatedValue [LocatedExpr] | VarDef String LocatedValue
data Program = [LocatedExpr]
However I don't like this that much. First of all, it clutters the definition of the evaluator and pattern matching has an extra layer every time. Also, I don't think saying that a function call takes located values as arguments is quite right. Function calls should take values as arguments, and locations should be metadata that doesn't interfere with the evaluator.
I need help redefining my types so that the solution is as clean as possible. Maybe there is a language extension or a design pattern I don't know about that could be helpful.
There are many ways to annotate an AST! This is half of what’s known as the AST typing problem, the other half being how you manage an AST that changes over the course of compilation. The problem isn’t exactly “solved”: all of the solutions have tradeoffs, and which one to pick depends on your expected use cases. I’ll go over a few that you might like to investigate at the end.
Whichever method you choose for organising the actual data types, if it makes pattern-matching ugly or unwieldy, the natural solution is PatternSynonyms.
Considering your first example:
{-# Language PatternSynonyms #-}
type Location = (Int, Int)
data Expr
= LocatedIf Value [Expr] Location
| LocatedVarDef String Value Location
-- Unidirectional pattern synonyms which ignore the location:
pattern If :: Value -> [Expr] -> Expr
pattern If val exprs <- LocatedIf val exprs _loc
pattern VarDef :: String -> Value -> Expr
pattern VarDef name expr <- LocatedVarDef name expr _loc
-- Inform GHC that matching ‘If’ and ‘VarDef’ is just as good
-- as matching ‘LocatedIf’ and ‘LocatedVarDef’.
{-# Complete If, VarDef #-}
This may be sufficiently tidy for your purposes already. But here are a few more tips that I find helpful.
Put annotations first: when adding an annotation type to an AST directly, I often prefer to place it as the first parameter of each constructor, so that it can be conveniently partially applied.
data LocatedExpr
= LocatedIf Location Value [Expr]
| LocatedVarDef Location String Value
If the annotation is a location, then this also makes it more convenient to obtain when writing certain kinds of parsers, along the lines of AnnotatedIf <$> (getSourceLocation <* ifKeyword) <*> value <*> many expr in a parser combinator library.
Parameterise your annotations: I often make the annotation type into a type parameter, so that GHC can derive some useful classes for me:
{-# Language
DeriveFoldable,
DeriveFunctor,
DeriveTraversable #-}
data AnnotatedExpr a
= AnnotatedIf a Value [Expr]
| AnnotatedVarDef a String Value
deriving (Functor, Foldable, Traversable)
type LocatedExpr = AnnotatedExpr Location
-- Get the annotation of an expression.
-- (Total as long as every constructor is annotated.)
exprAnnotation :: AnnotatedExpr a -> a
exprAnnotation = head
-- Update annotations purely.
mapAnnotations
:: (a -> b)
-> AnnotatedExpr a -> AnnotatedExpr b
mapAnnotations = fmap
-- traverse, foldMap, &c.
If you want “doesn’t interfere”, use polymorphism: you can enforce that the evaluator can’t inspect the annotation type by being polymorphic over it. Pattern synonyms still let you match on these expressions conveniently:
pattern If :: Value -> [AnnotatedExpr a] -> AnnotatedExpr a
pattern If val exprs <- AnnotatedIf _anno val exprs
-- …
eval :: AnnotatedExpr a -> Value
eval expr = case expr of
If val exprs -> -- …
VarDef name expr -> -- …
Unannotated terms aren’t your enemy: a term without source locations is no good for error reporting, but I think it’s still a good idea to make the pattern synonyms bidirectional for the convenience of constructing unannotated terms with a unit () annotation. (Or something equivalent, if you use e.g. Maybe Location as the annotation type.)
The reason is that this is quite convenient for writing unit tests, where you want to check the output, but want to use Eq instead of pattern matching, and don’t want to have to compare all the source locations in tests that aren’t concerned with them. Using the derived classes, void :: (Functor f) => f a -> f () strips out all the annotations on an AST.
import Control.Monad (void)
type BareExpr = AnnotatedExpr ()
-- One way to define bidirectional synonyms, so e.g.
-- ‘If’ can be used as either a pattern or a constructor.
pattern If :: Value -> [BareExpr] -> BareExpr
pattern If val exprs = AnnotatedIf () val exprs
-- …
stripAnnotations :: AnnotatedExpr a -> BareExpr
stripAnnotations = void
Equivalently, you could use GADTs / ExistentialQuantification to say data AnyExpr where { AnyExpr :: AnnotatedExpr a -> AnyExpr } / data AnyExpr = forall a. AnyExpr (AnnotatedExpr a); that way, the annotations have exactly as much information as (), but you don’t need to fmap over the entire tree with void in order to strip it, just apply the AnyExpr constructor to hide the type.
Finally, here are some brief introductions to a few AST typing solutions.
Annotate each AST node with a tag (e.g. a unique ID), then store all metadata like source locations, types, and whatever else, separately from the AST:
import Data.IntMap (IntMap)
-- More sophisticated/stronglier-typed tags are possible.
newtype Tag = Tag Int
newtype TagMap a = TagMap (IntMap a)
data Expr
= If !Tag Value [Expr]
| VarDef !Tag String Expr
type Span = (Location, Location)
type SourceMap = TagMap Span
type CommentMap = TagMap (Span, String)
parse
:: String -- Input
-> Either ParseError
( Expr -- Parsed expression
, SourceMap -- Source locations of tags
, CommentMap -- Sideband for comments
-- …
)
The advantage is that you can very easily mix in arbitrary new types of annotations anywhere, without affecting the AST itself, and avoid rewriting the AST just to change annotations. You can think of the tree and annotation tables as a kind of database, where the tags are the “foreign keys” relating them. A downside is that you must be careful to maintain these tags when you do rewrite the AST.
I don’t know if this approach has an established name; I think of it as just “tagging” or a “tagged AST”.
recursion-schemes and/or Data Types à la CartePDF: separate out the “recursive” part of an annotated expression tree from the “annotation” part, and use Fix to tie them back together, with Compose (or Cofree) to add annotations in the middle.
data ExprF e
= IfF Value [e]
| VarDefF String e
-- …
deriving (Foldable, Functor, Traversable, …)
-- Unannotated: Expr ~ ExprF (ExprF (ExprF (…)))
type Expr = Fix ExprF
-- With a location at each recursive step:
--
-- LocatedExpr ~ Located (ExprF (Located (ExprF (…))))
type LocatedExpr = Fix (Compose Located ExprF)
data Located a = Located Location a
deriving (Foldable, Functor, Traversable, …)
-- or: type Located = (,) Location
A distinct advantage is that you get a bunch of nice traversal stuff like cata for free-ish, so you can avoid having to write manual traversals over your AST over and over. A downside is that it adds some pattern clutter to clean up, as does the “à la carte” approach, but they do offer a lot of flexibility.
Trees That GrowPDF is overkill for just source locations, but in a serious compiler it’s quite helpful. If you expect to have more than one annotation type (such as inferred types or other analysis results) or an AST that changes over time, then you add a type parameter for the “compilation phase” (parsed, renamed, typechecked, desugared, &c.) and select field types or enable & disable constructors based on that index.
A really unfortunate downside of this is that you often have to rewrite the tree even in places nothing has changed, because everything depends on the “phase”. An alternative that I use is to add one type parameter for each type of phase or annotation that can vary independently, e.g. data Expr annotation termVarName typeVarName, and abstract over that with type and pattern synonyms. This lets you update indices independently and still use classes like Functor and Bitraversable.

Is there a better way to add attribute field into AST in Haskell?

At first, I have a original AST definition like this:
data Expr = LitI Int | LitB Bool | Add Expr Expr
And I want to generalize it so that each AST node can contains some extra attributes:
data Expr a = LitI Int a | LitB Bool a | Add (Expr a) (Expr a) a
In this way, we can easily attach attribute into each node of the AST:
type ExprWithType = Expr TypeRep
type ExprWithSize = Expr Int
But this solution makes it hard to visit the attribute field, we must use pattern matching and process it case by case:
attribute :: Expr a -> a
attribute e = case e of
LitI _ a -> a
LitB _ a -> a
Add _ _ a -> a
We can image that if we can define our AST via a product type of the original AST and the type variable indicating attribute:
type ExprWithType = (Expr, TypeRep)
type ExprWithSize = (Expr, Int)
Then we can simplify the attribute visiting function like this:
attribute = snd
But we know that, the attribute from the outmost product type will not recursively appears in the subtrees.
So, is there a better solution for this problem?
Generally speaking, When we want to extract common field of different cases of a recursive sum type, we met this problem.
You could "lift" the type of the Expr for example like:
data Expr e = LitI Int | LitB Bool | Add e e
Now we can define a data type like:
data ExprAttr a = ExprAttr {
expression :: Expr (ExprAttr a),
attribute :: a
}
So here the ExprAttr has two parameters, the expression, which is thus an Expression that has ExprAttr as in the tree, and attribute which is an a.
You can thus process ExprAttrs, which is an AST of ExprAttrs. If you want to use a "simple" AST, you can define a type like:
newtype SimExpr = SimExpr (Expr SimExpr)
You might want to take a look at Cofree where f will be your recursive data type after abstracting the concept of recursion as an f-algebra and a will be the type of your annotation.
Nate Faubion gave a very accesible talk about this and similar approaches and you can watch it here: https://www.youtube.com/watch?v=eKkxmVFcd74

Haskell AST with recursive types

I'm currently trying to build a lambda calculus solver, and I'm having a slight problem with constructing the AST. A lambda calculus term is inductively defined as:
1) A variable
2) A lambda, a variable, a dot, and a lambda expression.
3) A bracket, a lambda expression, a lambda expression and a bracket.
What I would like to do (and at first tried) is this:
data Expr =
Variable
| Abstract Variable Expr
| Application Expr Expr
Now obviously this doesn't work, since Variable is not a type, and Abstract Variable Expr expects types. So my hacky solution to this is to have:
type Variable = String
data Expr =
Atomic Variable
| Abstract Variable Expr
| Application Expr Expr
Now this is really annoying since I don't like the Atomic Variable on its own, but Abstract taking a string rather than an expr. Is there any way I can make this more elegant, and do it like the first solution?
Your first solution is just an erroneous definition without meaning. Variable is not a type there, it's a nullary value constructor. You can't refer to Variable in a type definition much like you can't refer to any value, like True, False or 100.
The second solution is in fact the direct translation of something we could write in BNF:
var ::= <string>
term ::= λ <var>. <term> | <term> <term> | <var>
And thus there is nothing wrong with it.
What you exactly want is to have some type like
data Expr
= Atomic Variable
| Abstract Expr Expr
| Application Expr Expr
But constrain first Expr in Abstract constructor to be only Atomic. There is no straightforward way to do this in Haskell because value of some type can be created by any constructor of this type. So the only approach is to make some separate data type or type alias for existing type (like your Variable type alias) and move all common logic into it. Your solution with Variable seems very ok to me.
But. You can use some other advanced features in Haskell to achieve you goal in different way. You can be inspired by glambda package which uses GADT to create typed lambda calculus. Also see this answer: https://stackoverflow.com/a/39931015/2900502
I can come up with next solution to achieve you minimal goals (if you only want to constrain first argument of Abstract):
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
data AtomicT
data AbstractT
data ApplicationT
data Expr :: * -> * where
Atomic :: String -> Expr AtomicT
Abstract :: Expr AtomicT -> Expr a -> Expr AbstractT
Application :: Expr a -> Expr b -> Expr ApplicationT
And next example works fine:
ex1 :: Expr AbstractT
ex1 = Abstract (Atomic "x") (Atomic "x")
But this example won't compile because of type mismatch:
ex2 :: Expr AbstractT
ex2 = Abstract ex1 ex1

Annotating Nested ADT in Haskell with Additional Data

While writing a compiler in Haskell, I have come across a particular issue a couple times now while working with nested data types. I will have an ADT defined something like
data AST = AST [GlobalDecl]
data GlobalDecl = Func Type Identifier [Stmt] | ...
data Stmt = Assign Identifier Exp | ...
data Exp = Var Identifier | ...
While performing some transformation on the AST, I might want to briefly carry around some extra data with variables that are used with in an expression. All of the options for doing this that I have considered so far seem to be fairly awkward. I could make a new data type:
data Exp' = Var' Identifier ExtraInfo | ...
but this means I would need a new definitions Stmt', GDecl', in order to form the slightly changed AST'. Another option is to add another data constructor to the original Exp, but only use it in that one particular part of the program:
data Exp = Var Identifier | Var' Identifier ExtraInfo | ...
If you do this, the typechecker can no longer prevent you from mistakenly using Var' in some other part the program.
A third option would be to simply keep the extra information around all the time, even though it has no relevance to the rest of the program:
data Exp = Var Identifier ExtraInfo | ...
Doable, but it's ugly, particularly if you only need the extra information briefly. For now I have just been putting the extra info in a Map Indentifier ExtraInfo, and carrying it around with the AST, either explicitly or via the state monad. This can get awkward fast, if, for instance, you need to annotate different occurances of the same Identifier with different info.
Does anyone have any elegant techniques for annotating nested data types?
One option to tag a structure with extra data is to use a higher kinded type parameter. If you only ever need to tag variables, you can do e.g.
data AST f = AST [GlobalDecl f]
data GlobalDecl f = Func Type Identifier [Stmt f] | ...
data Stmt = Assign Identifier (Exp f) | ...
data Exp f = Var (f Identifier) | ...
This is similar to what Peter suggested but instead of making the types fully generic it only parametricizes the part you want to vary.
You'll get your original, untagged structure with AST Identity or you can have a type like AST ((,) ExtraInfo) which would turn Var (f Identifier) into Var (ExtraInfo, Identifier).
If you need to tag every level of the AST with some extra information (e.g. token positions), you could even define the data type as
data AST f = AST [f (GlobalDecl f)]
data GlobalDecl f = Func (f (Type f)) (f (Identifier f)) [f (Stmt f)] | ...
data Stmt f = Assign (f (Identifier f)) (f (Exp f)) | ...
data Exp f = Var (f (Identifier f)) | ...
Now AST ((,) ExtraInfo) would contain extra information at every branching point in the syntax tree (granted, working with the above structure will get a bit cumbersome).
If you make all of your types more polymorphic, like this:
data AST a = AST a
data GlobalDecl t i s = Func t i [s] | ...
data Stmt i e = Assign i e | ...
data Exp a = Var a | ...
then you can temporarily instantiate them with a tuple - e.g. Exp (Int, Identifier) - for intermediate computations. If necessary, you can make newtypes, for the concrete types above, for convenience.

haskell type,new type or data for only an upper case char

If i want to make a String but holds only an uppercase character. I know that String is a [Char]. I have tried something like type a = ['A'..'Z'] but it did not work any help?
What you're wanting is dependent types, which Haskell doesn't have. Dependent types are those that depend on values, so using dependent types you could encode at the type level a vector with length 5 as
only5 :: Vector 5 a -> Vector 10 a
only5 vec = concatenate vec vec
Again, Haskell does not have dependent types, but languages like Agda, Coq and Idris do support them. Instead, you could just use a "smart constructor"
module MyModule
( Upper -- export type only, not constructor
, mkUpper -- export the smart constructor
) where
import Data.Char (isUpper)
newtype Upper = Upper String deriving (Eq, Show, Read, Ord)
mkUpper :: String -> Maybe Upper
mkUpper s = if all isUpper s then Just (Upper s) else Nothing
Here the constructor Upper is not exported, just the type, and then users of this module have to use the mkUpper function that safely rejects non-uppercase strings.
For clarification, and to show how awesome dependent types can be, consider the mysterious concatenate function from above. If I were to define this with dependent types, it would actually look something like
concatenate :: Vector n a -> Vector m a -> Vector (n + m) a
concatenate v1 v2 = undefined
Wait, what's arithmetic doing in a type signature? It's actually performing type-system level computations on the values that this type is dependent on. This removes a lot of potential boilerplate in Haskell, and it makes guarantees at compilation time that, e.g., arrays can't have negative length.
Most desires for dependent types can be filled either using smart constructors (see bheklilr's answer), generating Haskell from an external tool (Coq, Isabelle, Inch, etc), or using an exact representation. You probably want the first solution.
To exactly represent just the capitals then you could write a data type that includes a constructor for each letter and conversion to/from strings:
data Capital = CA | CB | CC | CD | CE | CF | CG | CH | CI | CJ | CK | CL | CM | CN | CO | CP | CQ | CR | CS | CT | CU | CV | CW | CX | CY | CZ deriving (Eq, Ord, Enum)
toString :: [Capital] -> String
toString = map (toEnum . (+ (fromEnum 'A')) . fromEnum)
You can even go a step further and allow conversion from string literals, "Anything in quotes", to a type [Capitals] by using the OverloadedStrings extension. Just add to the top of your file {-# LANGUAGE OverloadedStrings, FlexibleInstances #-}, be sure to import Data.String and write the instance:
type Capitals = [Capital]
instance IsString Capitals where
fromString = map (toEnum . (subtract (fromEnum 'A')) . fromEnum) . filter (\x -> 'A' <= x && x <= 'Z')
After that, you can type capitals all you want!
*Main> toString ("jfoeaFJOEW" :: Capitals)
"FJOEW"
*Main>
bheklilr is correct but perhaps for your purposes the following could be OK:
import Data.Char(toUpper)
newtype UpperChar = UpperChar Char
deriving (Show)
upperchar :: Char -> UpperChar
upperchar = UpperChar. toUpper
You can alternatively make UpperChar an alias of Char (use type instead of newtype) which would allow you to forms lists of both Char and UpperChar. The problem with an alias, however, is that you could feed a Char into a function expecting an UpperChar...
One way to do something similar which will work well for the Latin script of your choice but not so well as a fully general solution is to use a custom type to represent upper case letters. Something like this should do the trick:
data UpperChar = A|B|C|D| (fill in the rest) | Y | Z deriving (Enum, Eq, Ord, Show)
newtype UpperString = UpperString [UpperChar]
instance Show UpperString
show (UpperString s) = map show s
The members of this type are not Haskell Strings, but you can convert between them as needed.

Resources