Implementing a language interpreter in Haskell - haskell

I want to implement an imperative language interpreter in Haskell (for educational purposes). But it's difficult for me to create right architecture for my interpreter: How should I store variables? How can I implement nested function calls? How should I implement variable scoping? How can I add debugging possibilities in my language? Should I use monads/monad transformers/other techniques? etc.
Does anybody know good articles/papers/tutorials/sources on this subject?

If you are new to writing this kind of processors, I would recommend to put off using monads for a while and first focus on getting a barebones implementation without any bells or whistles.
The following may serve as a minitutorial.
I assume that you have already tackled the issue of parsing the source text of the programs you want to write an interpreter for and that you have some types for capturing the abstract syntax of your language. The language that I use here is very simple and only consists of integer expressions and some basic statements.
Preliminaries
Let us first import some modules that we will use in just a bit.
import Data.Function
import Data.List
The essence of an imperative language is that it has some form of mutable variables. Here, variables simply represented by strings:
type Var = String
Expressions
Next, we define expressions. Expressions are constructed from integer constants, variable references, and arithmetic operations.
infixl 6 :+:, :-:
infixl 7 :*:, :/:
data Exp
= C Int -- constant
| V Var -- variable
| Exp :+: Exp -- addition
| Exp :-: Exp -- subtraction
| Exp :*: Exp -- multiplication
| Exp :/: Exp -- division
For example, the expression that adds the constant 2 to the variable x is represented by V "x" :+: C 2.
Statements
The statement language is rather minimal. We have three forms of statements: variable assignments, while loops, and sequences.
infix 1 :=
data Stmt
= Var := Exp -- assignment
| While Exp Stmt -- loop
| Seq [Stmt] -- sequence
For example, a sequence of statements for "swapping" the values of the variables x and y can be represented by Seq ["tmp" := V "x", "x" := V "y", "y" := V "tmp"].
Programs
A program is just a statement.
type Prog = Stmt
Stores
Now, let us move to the actual interpreter. While running a program, we need to keep track of the values assigned to the different variables in the programs. These values are just integers and as a representation of our "memory" we just use lists of pairs consisting of a variable and a value.
type Val = Int
type Store = [(Var, Val)]
Evaluating expressions
Expressions are evaluated by mapping constants to their value, looking up the values of variables in the store, and mapping arithmetic operations to their Haskell counterparts.
eval :: Exp -> Store -> Val
eval (C n) r = n
eval (V x) r = case lookup x r of
Nothing -> error ("unbound variable `" ++ x ++ "'")
Just v -> v
eval (e1 :+: e2) r = eval e1 r + eval e2 r
eval (e1 :-: e2) r = eval e1 r - eval e2 r
eval (e1 :*: e2) r = eval e1 r * eval e2 r
eval (e1 :/: e2) r = eval e1 r `div` eval e2 r
Note that if the store contains multiple bindings for a variable, lookup selects the bindings that comes first in the store.
Executing statements
While the evaluation of an expression cannot alter the contents of the store, executing a statement may in fact result in an update of the store. Hence, the function for executing a statement takes a store as an argument and produces a possibly updated store.
exec :: Stmt -> Store -> Store
exec (x := e) r = (x, eval e r) : r
exec (While e s) r | eval e r /= 0 = exec (Seq [s, While e s]) r
| otherwise = r
exec (Seq []) r = r
exec (Seq (s : ss)) r = exec (Seq ss) (exec s r)
Note that, in the case of assignments, we simply push a new binding for the updated variable to the store, effectively shadowing any previous bindings for that variable.
Top-level Interpreter
Running a program reduces to executing its top-level statement in the context of an initial store.
run :: Prog -> Store -> Store
run p r = nubBy ((==) `on` fst) (exec p r)
After executing the statement we clean up any shadowed bindings, so that we can easily read off the contents of the final store.
Example
As an example, consider the following program that computes the Fibonacci number of the number stored in the variable n and stores its result in the variable x.
fib :: Prog
fib = Seq
[ "x" := C 0
, "y" := C 1
, While (V "n") $ Seq
[ "z" := V "x" :+: V "y"
, "x" := V "y"
, "y" := V "z"
, "n" := V "n" :-: C 1
]
]
For instance, in an interactive environment, we can now use our interpreter to compute the 25th Fibonacci number:
> lookup "x" $ run fib [("n", 25)]
Just 75025
Monadic Interpretation
Of course, here, we are dealing with a very simple and tiny imperative language. As your language gets more complex, so will the implementation of your interpreter. Think for example about what additions you need when you add procedures and need to distinguish between local (stack-based) storage and global (heap-based) storage. Returning to that part of your question, you may then indeed consider the introduction of monads to streamline the implementation of your interpreter a bit.
In the example interpreter above, there are two "effects" that are candidates for being captured by a monadic structure:
The passing around and updating of the store.
Aborting running the program when a run-time error is encountered. (In the implementation above, the interpreter simply crashes when such an error occurs.)
The first effect is typically captured by a state monad, the second by an error monad. Let us briefly investigate how to do this for our interpreter.
We prepare by importing just one more module from the standard libraries.
import Control.Monad
We can use monad transformers to construct a composite monad for our two effects by combining a basic state monad and a basic error monad. Here, however, we simply construct the composite monad in one go.
newtype Interp a = Interp { runInterp :: Store -> Either String (a, Store) }
instance Monad Interp where
return x = Interp $ \r -> Right (x, r)
i >>= k = Interp $ \r -> case runInterp i r of
Left msg -> Left msg
Right (x, r') -> runInterp (k x) r'
fail msg = Interp $ \_ -> Left msg
Edit 2018: The Applicative Monad Proposal
Since the Applicative Monad Proposal (AMP) every Monad must also be an instance of Functor and Applicative. To do this we can add
import Control.Applicative -- Otherwise you can't do the Applicative instance.
to the imports and make Interp an instance of Functor and Applicative like this
instance Functor Interp where
fmap = liftM -- imported from Control.Monad
instance Applicative Interp where
pure = return
(<*>) = ap -- imported from Control.Monad
Edit 2018 end
For reading from and writing to the store, we introduce effectful functions rd and wr:
rd :: Var -> Interp Val
rd x = Interp $ \r -> case lookup x r of
Nothing -> Left ("unbound variable `" ++ x ++ "'")
Just v -> Right (v, r)
wr :: Var -> Val -> Interp ()
wr x v = Interp $ \r -> Right ((), (x, v) : r)
Note that rd produces a Left-wrapped error message if a variable lookup fails.
The monadic version of the expression evaluator now reads
eval :: Exp -> Interp Val
eval (C n) = do return n
eval (V x) = do rd x
eval (e1 :+: e2) = do v1 <- eval e1
v2 <- eval e2
return (v1 + v2)
eval (e1 :-: e2) = do v1 <- eval e1
v2 <- eval e2
return (v1 - v2)
eval (e1 :*: e2) = do v1 <- eval e1
v2 <- eval e2
return (v1 * v2)
eval (e1 :/: e2) = do v1 <- eval e1
v2 <- eval e2
if v2 == 0
then fail "division by zero"
else return (v1 `div` v2)
In the case for :/:, division by zero results in an error message being produced through the Monad-method fail, which, for Interp, reduces to wrapping the message in a Left-value.
For the execution of statements we have
exec :: Stmt -> Interp ()
exec (x := e) = do v <- eval e
wr x v
exec (While e s) = do v <- eval e
when (v /= 0) (exec (Seq [s, While e s]))
exec (Seq []) = do return ()
exec (Seq (s : ss)) = do exec s
exec (Seq ss)
The type of exec conveys that statements do not result in values but are executed only for their effects on the store or the run-time errors they may trigger.
Finally, in the function run we perform a monadic computation and process its effects.
run :: Prog -> Store -> Either String Store
run p r = case runInterp (exec p) r of
Left msg -> Left msg
Right (_, r') -> Right (nubBy ((==) `on` fst) r')
In the interactive environment, we can now revisit the interpretation of our example program:
> lookup "x" `fmap` run fib [("n", 25)]
Right (Just 75025)
> lookup "x" `fmap` run fib []
Left "unbound variable `n'"

A couple of good papers I've finally found:
Building Interpreters by Composing Monads
Monad Transformers Step by Step - how incrementally build tiny interpreter using
monad transformers
How to build a monadic interpreter in one day
Monad Transformers and Modular Interpreters

Related

Is this syntax as expressive as the do-notation?

The do notation allows us to express monadic code without overwhelming nestings, so that
main = getLine >>= \ a ->
getLine >>= \ b ->
putStrLn (a ++ b)
can be expressed as
main = do
a <- getLine
b <- getLine
putStrLn (a ++ b)
Suppose, though, the syntax allows ... #expression ... to stand for do { x <- expression; return (... x ...) }. For example, foo = f a #(b 1) c would be desugared as: foo = do { x <- b 1; return (f a x c) }. The code above could, then, be expressed as:
main = let a = #getLine in
let b = #getLine in
putStrLn (a ++ b)
Which would be desugared as:
main = do
x <- getLine
let a = x in
return (do
x' <- getLine
let b = x' in
return (putStrLn (a ++ b)))
That is equivalent. This syntax is appealing to me because it seems to offer the same functionality as the do-notation, while also allowing some shorter expressions such as:
main = putStrLn (#(getLine) ++ #(getLine))
So, I wonder if there is anything defective with this proposed syntax, or if it is indeed complete and equivalent to the do-notation.
putStrLn is already String -> IO (), so your desugaring ... return (... return (putStrLn (a ++ b))) ends up having type IO (IO (IO ())), which is likely not what you wanted: running this program won't print anything!
Speaking more generally, your notation can't express any do-block which doesn't end in return. [See Derek Elkins' comment.]
I don't believe your notation can express join, which can be expressed with do without any additional functions:
join :: Monad m => m (m a) -> m a
join mx = do { x <- mx; x }
However, you can express fmap constrained to Monad:
fmap' :: Monad m => (a -> b) -> m a -> m b
fmap' f mx = f #mx
and >>= (and thus everything else) can be expressed using fmap' and join. So adding join would make your notation complete, but still not convenient in many cases, because you end up needing a lot of joins.
However, if you drop return from the translation, you get something quite similar to Idris' bang notation:
In many cases, using do-notation can make programs unnecessarily verbose, particularly in cases such as m_add above where the value bound is used once, immediately. In these cases, we can use a shorthand version, as follows:
m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = pure (!x + !y)
The notation !expr means that the expression expr should be evaluated and then implicitly bound. Conceptually, we can think of ! as being a prefix function with the following type:
(!) : m a -> a
Note, however, that it is not really a function, merely syntax! In practice, a subexpression !expr will lift expr as high as possible within its current scope, bind it to a fresh name x, and replace !expr with x. Expressions are lifted depth first, left to right. In practice, !-notation allows us to program in a more direct style, while still giving a notational clue as to which expressions are monadic.
For example, the expression:
let y = 42 in f !(g !(print y) !x)
is lifted to:
let y = 42 in do y' <- print y
x' <- x
g' <- g y' x'
f g'
Adding it to GHC was discussed, but rejected (so far). Unfortunately, I can't find the threads discussing it.
How about this:
do a <- something
b <- somethingElse a
somethingFinal a b

How to prove this Haskell code using equational reasoning

I found this exercise on equational reasoning and proofs in Haskell. The following code is given:
type Stack = [Int]
type Code = [Op]
data Op = PUSH Int | ADD
deriving (Show)
--
-- Stack machine
--
exec :: Code -> Stack -> Stack
exec [ ] s = s
exec (PUSH n : c) s = exec c (n:s)
exec (ADD:c) (m:n:s) = exec c (n+m : s)
--
-- Interpeter
--
data Expr = Val Int | Add Expr Expr
deriving (Show)
eval :: Expr -> Int
eval (Val n) = n
eval (Add x y) = eval x+eval y
--
-- Compiler
--
comp :: Expr -> Code
comp (Val n) = [PUSH n]
comp (Add x y) = comp x ++ comp y ++ [ADD]
Now I have to prove that exec(comp e) s = eval e : s.
So I found this answer so far:
We have to prove that exec (comp e) s = eval e : s.
First case: Assume e = (Val n). Then comp (Val n) = [PUSH n], so we have to prove that exec ([PUSH n]) s = eval ([PUSH n] : s). We find that exec ([PUSH n]) s = exec [] (n:s) = (n:s) using the function definition of exec.
Now eval (Val n) : s = n : s. The first case is OK!
Second case: Assume e = (Add x y). Then comp (Add x y) = comp x ++ comp y ++ [ADD].
But now I'm struggling with this recursive use of comp. Should I be using some form of trees and induction on these trees to prove this? I'm not completely sure how to do that.
When the first argument to exec is a list, the two possibilities are:
exec (PUSH n: codes) -- #1
exec (ADD : codes) -- #2
In the induction step you get to assume that the proposition holds for codes, i.e. you may assume:
exec codes s = eval codes : s
for any value of s -- Keep this in mind - this is usually the key step in any induction proof.
Start by expanding #1 using the code you've written for exec:
exec (PUSH n: codes) s == exec codes (n:s)
== ...
== ...
== eval (PUSH n: codes) : s
Can you see a place to use the induction hypothesis?

Creating an Interpreter in Haskell

So, I need to write a function evalS :: Statement -> Store -> Store that takes as input a statement and a store and returns a possibly modified store.
The following case has been given to me:
evalS w#(While e s1) s = case (evalE e s) of
(BoolVal True,s') -> let s'' = evalS s1 s' in evalS w s''
(BoolVal False,s') -> s'
_ -> error "Condition must be a BoolVal
And I need to write:
evalS Skip s = ...
evalS (Expr e) s = ...
evalS (Sequence s1 s2) s = ...
evalS (If e s1 s2) s = ...
In the If case, if e evaluates to a non-boolean value, I need to throw an error using the error function.
Sample input/output:
> run stmtParser "x=1+1" evalS
fromList [("x",2)]
> run stmtParser "x = 2; x = x + 3" evalS
fromList [("x",5)]
> run stmtParser "if true then x = 1 else x = 2 end" evalS
fromList [("x",1)]
> run stmtParser "x=2; y=x + 3; if y < 4 then z = true else z = false end" evalS
fromList [("x",2),("y",5),("z",false)]
> run stmtParser "x = 1; while x < 3 do x = x + 1 end" evalS
fromList [("x",3)]
> run stmtParser "x = 1 ; y = 1; while x < 5 do x = x + 1 ; y = y * x end" evalS
fromList [("x",5),("y",120)]
Code for stmtParser:
-- Sequence of statements
stmtParser :: Parser Statement
stmtParser = stmtParser1 `chainl1` (P.semi lexer >> return Sequence)
-- Single statements
stmtParser1 :: Parser Statement
stmtParser1 = (Expr <$> exprParser)
<|> do
P.reserved lexer "if"
cond <- exprParser
P.reserved lexer "then"
the <- stmtParser
P.reserved lexer "else"
els <- stmtParser
P.reserved lexer "end"
return (If cond the els)
<|> do
P.reserved lexer "while"
cond <- exprParser
P.reserved lexer "do"
body <- stmtParser
P.reserved lexer "end"
return (While cond body)
WHAT I'VE TRIED:
I am not sure if I need to use evalE in this problem or not. I have written it in a previous problem. The signature for evalE is evalE :: Expression -> Store -> (Value, Store) and asked me to write:
evalE (Var x) s = ...
evalE (Val v) s = ...
evalE (Assignment x e) s = ...
I have done the above, already.
ATTEMPT:
evalS Skip s = show s -- I am assuming that since Skip returns an empty String, I just need to return an empty String.
evalS (Sequence s1 s2) s = evalS s1 >> evalS s2 -- sequence1 then sequence2. I am not quite sure what to do with the s.
evalS (Expr e) s = ... Not sure what to do, here.
evalS (If e s1 s2) s = do
x <- evalE e
case x of
BoolVal True -> evalS s1
BoolVal False -> evalS s2
I am having trouble writing the above statements.
For reference, here is the entire skeleton that was given to me to work with:
-- Necessary imports
import Control.Applicative ((<$>),liftA,liftA2)
import Data.Map
import Text.Parsec
import Text.Parsec.Expr
import Text.Parsec.Language (emptyDef)
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as P
--------- AST Nodes ---------
-- Variables are identified by their name as string
type Variable = String
-- Values are either integers or booleans
data Value = IntVal Int -- Integer value
| BoolVal Bool -- Boolean value
-- Expressions are variables, literal values, unary and binary operations
data Expression = Var Variable -- e.g. x
| Val Value -- e.g. 2
| BinOp Op Expression Expression -- e.g. x + 3
| Assignment Variable Expression -- e.g. x = 3
-- Statements are expressions, conditionals, while loops and sequences
data Statement = Expr Expression -- e.g. x = 23
| If Expression Statement Statement -- if e then s1 else s2 end
| While Expression Statement -- while e do s end
| Sequence Statement Statement -- s1; s2
| Skip -- no-op
-- All binary operations
data Op = Plus -- + :: Int -> Int -> Int
| Minus -- - :: Int -> Int -> Int
| Times -- * :: Int -> Int -> Int
| GreaterThan -- > :: Int -> Int -> Bool
| Equals -- == :: Int -> Int -> Bool
| LessThan -- < :: Int -> Int -> Bool
-- The `Store` is an associative map from `Variable` to `Value` representing the memory
type Store = Map Variable Value
--------- Parser ---------
-- The Lexer
lexer = P.makeTokenParser (emptyDef {
P.identStart = letter,
P.identLetter = alphaNum,
P.reservedOpNames = ["+", "-", "*", "!", ">", "=", "==", "<"],
P.reservedNames = ["true", "false", "if", "in", "then", "else", "while", "end", "to", "do", "for"]
})
-- The Parser
-- Number literals
numberParser :: Parser Value
numberParser = (IntVal . fromIntegral) <$> P.natural lexer
-- Boolean literals
boolParser :: Parser Value
boolParser = (P.reserved lexer "true" >> return (BoolVal True))
<|> (P.reserved lexer "false" >> return (BoolVal False))
-- Literals and Variables
valueParser :: Parser Expression
valueParser = Val <$> (numberParser <|> boolParser)
<|> Var <$> P.identifier lexer
-- -- Expressions
exprParser :: Parser Expression
exprParser = liftA2 Assignment
(try (P.identifier lexer >>= (\v ->
P.reservedOp lexer "=" >> return v)))
exprParser
<|> buildExpressionParser table valueParser
where table = [[Infix (op "*" (BinOp Times)) AssocLeft]
,[Infix (op "+" (BinOp Plus)) AssocLeft]
,[Infix (op "-" (BinOp Minus)) AssocLeft]
,[Infix (op ">" (BinOp GreaterThan)) AssocLeft]
,[Infix (op "==" (BinOp Equals)) AssocLeft]
,[Infix (op "<" (BinOp LessThan)) AssocLeft]]
op name node = (P.reservedOp lexer name) >> return node
-- Sequence of statements
stmtParser :: Parser Statement
stmtParser = stmtParser1 `chainl1` (P.semi lexer >> return Sequence)
-- Single statements
stmtParser1 :: Parser Statement
stmtParser1 = (Expr <$> exprParser)
<|> do
P.reserved lexer "if"
cond <- exprParser
P.reserved lexer "then"
the <- stmtParser
P.reserved lexer "else"
els <- stmtParser
P.reserved lexer "end"
return (If cond the els)
<|> do
P.reserved lexer "while"
cond <- exprParser
P.reserved lexer "do"
body <- stmtParser
P.reserved lexer "end"
return (While cond body)
-------- Helper functions --------
-- Lift primitive operations on IntVal and BoolVal values
liftIII :: (Int -> Int -> Int) -> Value -> Value -> Value
liftIII f (IntVal x) (IntVal y) = IntVal $ f x y
liftIIB :: (Int -> Int -> Bool) -> Value -> Value -> Value
liftIIB f (IntVal x) (IntVal y) = BoolVal $ f x y
-- Apply the correct primitive operator for the given Op value
applyOp :: Op -> Value -> Value -> Value
applyOp Plus = liftIII (+)
applyOp Minus = liftIII (-)
applyOp Times = liftIII (*)
applyOp GreaterThan = liftIIB (>)
applyOp Equals = liftIIB (==)
applyOp LessThan = liftIIB (<)
-- Parse and print (pp) the given WHILE programs
pp :: String -> IO ()
pp input = case (parse stmtParser "" input) of
Left err -> print err
Right x -> print x
-- Parse and run the given WHILE programs
run :: (Show v) => (Parser n) -> String -> (n -> Store -> v) -> IO ()
run parser input eval = case (parse parser "" input) of
Left err -> print err
Right x -> print (eval x empty)
It's a little difficult to answer your question, because you didn't actually ask one. Let me just pick out a few of the things that you've said, in order to give you a few clues.
I am not sure if I need to use evalE in this problem or not. I have written it in a previous problem. The signature for evalE is evalE :: Expression -> Store -> (Value, Store)
evalS (Expr e) s = ... Not sure what to do, here.
What does it mean to execute a statement which consists of an expression? If it has something to do with evaluating the expression, then the fact that you have an expression evaluator available might help with "what to do, here".
Next, compare the code you've been given for "while" (which contains a fine example of a sensible thing to do with an expression, by the way)...
evalS w#(While e s1) s = case (evalE e s) of`
(BoolVal True,s') -> let s'' = evalS s1 s' in evalS w s''
(BoolVal False,s') -> s'
_ -> error "Condition must be a BoolVal"
...and compare it with your code for "if"
evalS (If e s1 s2) s = do
x <- evalE e
case x of
BoolVal True -> evalS s1
BoolVal False -> evalS s2
Your code is in a rather different style — the "monadic" style. Where are you getting that from? It would make sense if the types of the evaluators were something like
evalE :: Expression -> State Store Value
evalS :: Statement -> State Store ()
The monadic style is a very nice way to thread the mutating store through the evaluation process without talking about it too much. E.g., your x <- evalE e means "let x be the result of evaluating e (quietly receiving the initial store and passing along the resulting store)". It's a good way to work which I expect you'll explore in due course.
But those aren't the types you've been given, and the monadic style is not appropriate. You have
evalE :: Expression -> Store -> (Value, Store)
evalS :: Statement -> Store -> Store
and the example code threads the store explicitly. Look again
evalS w#(While e s1) s = case (evalE e s) of`
(BoolVal True,s') -> let s'' = evalS s1 s' in evalS w s''
(BoolVal False,s') -> s'
_ -> error "Condition must be a BoolVal"
See? evalS receives its initial store, s, explicitly, and uses it explicitly in evalE e s. The resulting new store is called s' in both case branches. If the loop is over, then s' is given back as the final store. Otherwise, s' is used as the store for one pass through the loop body, s1, and the store s'' resulting from that is used for the next time around the loop, w.
Your code will need to be similarly explicit in the way it names and uses the store at each stage of evaluation. Let's walk through.
evalS Skip s = show s -- I am assuming that since Skip returns an empty String, I just need to return an empty String.
You assume incorrectly. The evalS function does not return a String, empty or otherwise: it returns a Store. Now, which Store? Your initial store is s: how will the store after "skip" relate to s?
evalS (Sequence s1 s2) s = evalS s1 >> evalS s2 -- sequence1 then sequence2. I am not quite sure what to do with the s.
Again, that's a monadic approach which does not fit with this question. You need to thread the store, intially s, through the process of evaluating statements s1 and s2 in sequence. The "while" case has a good example of how to do something like that.
evalS (Expr e) s = ... Not sure what to do, here.
Again, the "while" example shows you one way to extract a value and an updated store by evaluating an expression. Food for thought, isn't it?
evalS (If e s1 s2) s = ...
Now "if" starts out by evaluating a condition, rather a lot like "while", no?
So, my advice amounts to this:
drop the monadic style code for now, but come back to it later when it's appropriate;
read the example implementation of "while" and understand how it treats expressions and sequences of statements, passing the store explicitly;
deploy the similar techniques to implement the other constructs.
The person who set the question has been kind enough to give you code which gives an example of everything you will need. Please reciprocate that kindness by comprehending and then taking the hint!
Since this looks as homework I'll just provide a few small hints, leaving the real work for you.
I am not sure if I need to use evalE in this problem or not.
Yes, you'll have to. In your language, an expression e modifies the store and returns a value: you can tell that from evalE returning a pair (Value,Store)
By comparison, the statement Expr e modifies the store without returning a value. To obtain the latter (statement evaluation) from the former (expression) all you need to do is to throw away what you do not need.
About your attempt:
evalS Skip s = show s -- I am assuming that since Skip returns an empty String, I just need to return an empty String.
Why a string? Does evalS return strings? If not, what it returns? You are doing far more work than you have to, here.
evalS (Sequence s1 s2) s = evalS s1 >> evalS s2 -- sequence1 then sequence2. I am not quite sure what to do with the s.
OK, the idea is right but the code is not. Forget about monads and >>, just think about the stores. You make two recursive calls evalS s1 and evalS s2: these look wrong because evalS expects two arguments (statement and store), and you only provide one.
And -- before you try it -- no, passing s to both of them would still be wrong. In which store is the first statement evaluated? What about the second?
evalS (Expr e) s = ... Not sure what to do, here.
See the discussion above.
evalS (If e s1 s2) s = do
x <- evalE e
case x of
BoolVal True -> evalS s1
BoolVal False -> evalS s2
Avoid monad-related operations, do and <-. There might be a way to use those to solve this task, but I'd not recommend to try that path to a beginner.
You can use let if you want to name intermediate results.
evalE takes two arguments, not one. Mind that it returns a pair, not a value. evalS takes two arguments.

Trying to apply CPS to an interpreter

I'm trying to use CPS to simplify control-flow implementation in my Python interpreter. Specifically, when implementing return/break/continue, I have to store state and unwind manually, which is tedious. I've read that it's extraordinarily tricky to implement exception handling in this way. What I want is for each eval function to be able to direct control flow to either the next instruction, or to a different instruction entirely.
Some people with more experience than me suggested looking into CPS as a way to deal with this properly. I really like how it simplifies control flow in the interpreter, but I'm not sure how much I need to actually do in order to accomplish this.
Do I need to run a CPS transform on the AST? Should I lower this AST into a lower-level IR that is smaller and then transform that?
Do I need to update the evaluator to accept the success continuation everywhere? (I'm assuming so).
I think I generally understand the CPS transform: the goal is to thread the continuation through the entire AST, including all expressions.
I'm also a bit confused where the Cont monad fits in here, as the host language is Haskell.
Edit: here's a condensed version of the AST in question. It is a 1-1 mapping of Python statements, expressions, and built-in values.
data Statement
= Assignment Expression Expression
| Expression Expression
| Break
| While Expression [Statement]
data Expression
| Attribute Expression String
| Constant Value
data Value
= String String
| Int Integer
| None
To evaluate statements, I use eval:
eval (Assignment (Variable var) expr) = do
value <- evalExpr expr
updateSymbol var value
eval (Expression e) = do
_ <- evalExpr e
return ()
To evaluate expressions, I use evalExpr:
evalExpr (Attribute target name) = do
receiver <- evalExpr target
attribute <- getAttr name receiver
case attribute of
Just v -> return v
Nothing -> fail $ "No attribute " ++ name
evalExpr (Constant c) = return c
What motivated the whole thing was the shenanigans required for implementing break. The break definition is reasonable, but what it does to the while definition is a bit much:
eval (Break) = do
env <- get
when (loopLevel env <= 0) (fail "Can only break in a loop!")
put env { flow = Breaking }
eval (While condition block) = do
setup
loop
cleanup
where
setup = do
env <- get
let level = loopLevel env
put env { loopLevel = level + 1 }
loop = do
env <- get
result <- evalExpr condition
when (isTruthy result && flow env == Next) $ do
evalBlock block
-- Pretty ugly! Eat continue.
updatedEnv <- get
when (flow updatedEnv == Continuing) $ put updatedEnv { flow = Next }
loop
cleanup = do
env <- get
let level = loopLevel env
put env { loopLevel = level - 1 }
case flow env of
Breaking -> put env { flow = Next }
Continuing -> put env { flow = Next }
_ -> return ()
I am sure there are more simplifications that can be done here, but the core problem is one of stuffing state somewhere and manually winding out. I'm hoping that CPS will let me stuff book-keeping (like loop exit points) into state and just use those when I need them.
I dislike the split between statements and expressions and worry it might make the CPS transform more work.
This finally gave me a good excuse to try using ContT!
Here's one possible way of doing this: store (in a Reader wrapped in ContT) the continuation of exiting the current (innermost) loop:
newtype M r a = M{ unM :: ContT r (ReaderT (M r ()) (StateT (Map Id Value) IO)) a }
deriving ( Functor, Applicative, Monad
, MonadReader (M r ()), MonadCont, MonadState (Map Id Value)
, MonadIO
)
runM :: M a a -> IO a
runM m = evalStateT (runReaderT (runContT (unM m) return) (error "not in a loop")) M.empty
withBreakHere :: M r () -> M r ()
withBreakHere act = callCC $ \break -> local (const $ break ()) act
break :: M r ()
break = join ask
(I've also added IO for easy printing in my toy interpreter, and State (Map Id Value) for variables).
Using this setup, you can write Break and While as:
eval Break = break
eval (While condition block) = withBreakHere $ fix $ \loop -> do
result <- evalExpr condition
unless (isTruthy result)
break
evalBlock block
loop
Here's the full code for reference:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Interp where
import Prelude hiding (break)
import Control.Applicative
import Control.Monad.Cont
import Control.Monad.State
import Control.Monad.Reader
import Data.Function
import Data.Map (Map)
import qualified Data.Map as M
import Data.Maybe
type Id = String
data Statement
= Print Expression
| Assign Id Expression
| Break
| While Expression [Statement]
| If Expression [Statement]
deriving Show
data Expression
= Var Id
| Constant Value
| Add Expression Expression
| Not Expression
deriving Show
data Value
= String String
| Int Integer
| None
deriving Show
data Env = Env{ loopLevel :: Int
, flow :: Flow
}
data Flow
= Breaking
| Continuing
| Next
deriving Eq
newtype M r a = M{ unM :: ContT r (ReaderT (M r ()) (StateT (Map Id Value) IO)) a }
deriving ( Functor, Applicative, Monad
, MonadReader (M r ()), MonadCont, MonadState (Map Id Value)
, MonadIO
)
runM :: M a a -> IO a
runM m = evalStateT (runReaderT (runContT (unM m) return) (error "not in a loop")) M.empty
withBreakHere :: M r () -> M r ()
withBreakHere act = callCC $ \break -> local (const $ break ()) act
break :: M r ()
break = join ask
evalExpr :: Expression -> M r Value
evalExpr (Constant val) = return val
evalExpr (Var v) = gets $ fromMaybe err . M.lookup v
where
err = error $ unwords ["Variable not in scope:", show v]
evalExpr (Add e1 e2) = do
Int val1 <- evalExpr e1
Int val2 <- evalExpr e2
return $ Int $ val1 + val2
evalExpr (Not e) = do
val <- evalExpr e
return $ if isTruthy val then None else Int 1
isTruthy (String s) = not $ null s
isTruthy (Int n) = n /= 0
isTruthy None = False
evalBlock = mapM_ eval
eval :: Statement -> M r ()
eval (Assign v e) = do
val <- evalExpr e
modify $ M.insert v val
eval (Print e) = do
val <- evalExpr e
liftIO $ print val
eval (If cond block) = do
val <- evalExpr cond
when (isTruthy val) $
evalBlock block
eval Break = break
eval (While condition block) = withBreakHere $ fix $ \loop -> do
result <- evalExpr condition
unless (isTruthy result)
break
evalBlock block
loop
and here's a neat test example:
prog = [ Assign "i" $ Constant $ Int 10
, While (Var "i") [ Print (Var "i")
, Assign "i" (Add (Var "i") (Constant $ Int (-1)))
, Assign "j" $ Constant $ Int 10
, While (Var "j") [ Print (Var "j")
, Assign "j" (Add (Var "j") (Constant $ Int (-1)))
, If (Not (Add (Var "j") (Constant $ Int (-4)))) [ Break ]
]
]
, Print $ Constant $ String "Done"
]
which is
i = 10
while i:
print i
i = i - 1
j = 10
while j:
print j
j = j - 1
if j == 4:
break
so it will print
10 10 9 8 7 6 5
9 10 9 8 7 6 5
8 10 9 8 7 6 5
...
1 10 9 8 7 6 5

Haskell: Convert an expression into a list of Instructions

This is a Homework Problem.
My goal is to Convert a type Expression in the form of " into a list of CPU Instructions. Given the data structures
data Expr = Const Double | Var String | Add Expr Expr | Mult Expr Expr
data Instr = LoadImmediate RegisterNumber -- put value here
RegisterValue -- the value
| Addition RegisterNumber -- put result here
RegisterNumber -- first thing to add
RegisterNumber -- second thing to multiply
| Multiply RegisterNumber -- put result here
RegisterNumber -- first thing to multiply
RegisterNumber -- second thing to multiply
type RegisterNumber = Int
type RegisterValue = Double
Type Expression has four main functions
Const: converts a number of type double to a type expression, letting you use it.
Var: basically converts a string (i.e. "x") into a type expression letting you apply it to constant
Then then Add and Mult commands that let you add and multiply two type expressions
And we can assume that the only variable we will see is "x" and it is already in register 2. The result will arrive in register 1. There is a total of 4 registers.
So Add Const 1 (Mult (Var "x") (Const 2))
in type [Instr] would be
[LoadImmediate 3 1,
LoadImmediate 4 2,
Multiply 1 4 2,
Addition 1 3 1]
EDIT: Sorry, I for got to mention, because this is a beginner course, we only need to consider expressions of the form
(Const a) `Add` ((Var "x") `Mult` Expr)
where 'Expr' is some expression. Or in math form a0+x(a1+x(a2+x...))
I fixed my code up a little bit, now the error I'm getting is "Not in scope: data constructor 'RegNum'"
expr2Code :: Expr -> [Instr]
expr2Code expr = e2C 1 expr
e2C:: Int -> Expr -> Instr
e2C RegNum (Const y) = [LoadImmediate RegNum y]
e2C RegNum (Var "x") = []
e2C RegNum (Add expr1 expr2) = e2C 3 expr1 ++ e2C 4 expr2 ++ [Addition RegNum 3 4]
e2C RegNum (Mult expr1 expr2) = e2C 3 expr1 ++ e2C 4 expr2 ++ [Multiply RegNum 3 4]
Sorry for the long post, any help would be appreciated.
Well I'm assuming you have an infinite number of registers. If not you can experience the joy that is register spilling, but you'd need some more instructions to deal with dynamic memory.
There are 3 straightforward ways to do this
Expressions -> SSA -> Instr
Expressions -> CPS -> Instr
Expressions -> Instr
The first 2 offer much easier opportunities to optimize your use of registers and what not, but involve an intermediate language. Since we're lazy, let's do 3.
import Control.Monad.State
type Gensym = State Int
gensym :: Gensym RegisterNumber
gensym = modify (+1) >> fmap (subtract 1) get
Now that we have a way of uniquely generating registers, we can do the wonderfully inefficient approach.
withRegister :: (RegisterNumber -> [Instr]) -> Gensym (RegisterNumber, [Instr])
withRegister f = gensym >>= \r -> return (r, f r)
compile :: Expr -> Gensym (RegisterNumber, [Instr])
compile (Const d) = withRegister $ \r -> [LoadImmediate r d]
compile (Var "x") = return (2, [])
compile (Add e1 e2) = do
(t1, c1) <- compile e1 -- Recursively compile
(t2, c2) <- compile e2 -- Each subexpression like you were
withRegister $ \r -> Addition t1 t2 r : (c1 ++ c2) -- Make sure we
-- use a unique register
compile (Mult e1 e2) = do
(t1, c1) <- compile e1
(t2, c2) <- compile e2
withRegister $ \r -> Multiply t1 t2 r : (c1 ++ c2)
compileExpr :: Expr -> [Instr]
compileExpr = reverse . snd . flip evalState 3 . compile
This basically recursively compiles each expression, concatting their various chunks of code together. This is similar to what you had, but there are 3 major differences,
I'm using a monad to handle the registers for me.
You have to ensure that you never clobber a register you're going to need, by using a monad I ensure that all the registers I'm using are unique. This is really inefficient,but trivially correct.
When handling Var, I just load whatever's in register 2, since LoadImmediate wants a double and you have no mechanism for actually binding variables in your expressions.
Because you're not dealing with expressions, each chunk of computation has to be stuck in a register explicitly. You can't do x + y * z any more. That's why if you look at the code for Add or Mult, each subexpression is compiled to a fresh register.
Your example generates
[LoadImmediate 4 2.0,Multiply 2 4 5,LoadImmediate 3 1.0,Addition 3 5 6]
Which is correct, it multiplies 4 and x, then adds 3.
e2C _ (Var "x") = LoadImmediate 2 "x"
If x is already in the register 2 you don't need to load it at all. Var "x" would not translate to any load operation. rather, it translates to the operand of 2 in some other operation (addition or multiplication). For instance, (Add (Const 25) (Var "x")) would translate to [LoadImmediate 3 25, Addition 1 3 2].
e2C _ (Mult x y) = Multiply 4 (e2C _ x) (e2C _ y)
This of course does not work, as operands of Multiply are registers, not instructions. You have to translate x and note to which register rx the result goes; then translate y and note to which register ry its result goes; make sure rx != xy; and finally, issue a Multiply rz rx ry.
Now, how to determine rz, and how to make sure rx != ry? One simple strategy is to make sure each result goes to its own register (assuming there's an infinite number of them, which is of course not true for real machine architectures).
The top-level result will go to register 1.

Resources