Creating an Interpreter in Haskell - haskell

So, I need to write a function evalS :: Statement -> Store -> Store that takes as input a statement and a store and returns a possibly modified store.
The following case has been given to me:
evalS w#(While e s1) s = case (evalE e s) of
(BoolVal True,s') -> let s'' = evalS s1 s' in evalS w s''
(BoolVal False,s') -> s'
_ -> error "Condition must be a BoolVal
And I need to write:
evalS Skip s = ...
evalS (Expr e) s = ...
evalS (Sequence s1 s2) s = ...
evalS (If e s1 s2) s = ...
In the If case, if e evaluates to a non-boolean value, I need to throw an error using the error function.
Sample input/output:
> run stmtParser "x=1+1" evalS
fromList [("x",2)]
> run stmtParser "x = 2; x = x + 3" evalS
fromList [("x",5)]
> run stmtParser "if true then x = 1 else x = 2 end" evalS
fromList [("x",1)]
> run stmtParser "x=2; y=x + 3; if y < 4 then z = true else z = false end" evalS
fromList [("x",2),("y",5),("z",false)]
> run stmtParser "x = 1; while x < 3 do x = x + 1 end" evalS
fromList [("x",3)]
> run stmtParser "x = 1 ; y = 1; while x < 5 do x = x + 1 ; y = y * x end" evalS
fromList [("x",5),("y",120)]
Code for stmtParser:
-- Sequence of statements
stmtParser :: Parser Statement
stmtParser = stmtParser1 `chainl1` (P.semi lexer >> return Sequence)
-- Single statements
stmtParser1 :: Parser Statement
stmtParser1 = (Expr <$> exprParser)
<|> do
P.reserved lexer "if"
cond <- exprParser
P.reserved lexer "then"
the <- stmtParser
P.reserved lexer "else"
els <- stmtParser
P.reserved lexer "end"
return (If cond the els)
<|> do
P.reserved lexer "while"
cond <- exprParser
P.reserved lexer "do"
body <- stmtParser
P.reserved lexer "end"
return (While cond body)
WHAT I'VE TRIED:
I am not sure if I need to use evalE in this problem or not. I have written it in a previous problem. The signature for evalE is evalE :: Expression -> Store -> (Value, Store) and asked me to write:
evalE (Var x) s = ...
evalE (Val v) s = ...
evalE (Assignment x e) s = ...
I have done the above, already.
ATTEMPT:
evalS Skip s = show s -- I am assuming that since Skip returns an empty String, I just need to return an empty String.
evalS (Sequence s1 s2) s = evalS s1 >> evalS s2 -- sequence1 then sequence2. I am not quite sure what to do with the s.
evalS (Expr e) s = ... Not sure what to do, here.
evalS (If e s1 s2) s = do
x <- evalE e
case x of
BoolVal True -> evalS s1
BoolVal False -> evalS s2
I am having trouble writing the above statements.
For reference, here is the entire skeleton that was given to me to work with:
-- Necessary imports
import Control.Applicative ((<$>),liftA,liftA2)
import Data.Map
import Text.Parsec
import Text.Parsec.Expr
import Text.Parsec.Language (emptyDef)
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as P
--------- AST Nodes ---------
-- Variables are identified by their name as string
type Variable = String
-- Values are either integers or booleans
data Value = IntVal Int -- Integer value
| BoolVal Bool -- Boolean value
-- Expressions are variables, literal values, unary and binary operations
data Expression = Var Variable -- e.g. x
| Val Value -- e.g. 2
| BinOp Op Expression Expression -- e.g. x + 3
| Assignment Variable Expression -- e.g. x = 3
-- Statements are expressions, conditionals, while loops and sequences
data Statement = Expr Expression -- e.g. x = 23
| If Expression Statement Statement -- if e then s1 else s2 end
| While Expression Statement -- while e do s end
| Sequence Statement Statement -- s1; s2
| Skip -- no-op
-- All binary operations
data Op = Plus -- + :: Int -> Int -> Int
| Minus -- - :: Int -> Int -> Int
| Times -- * :: Int -> Int -> Int
| GreaterThan -- > :: Int -> Int -> Bool
| Equals -- == :: Int -> Int -> Bool
| LessThan -- < :: Int -> Int -> Bool
-- The `Store` is an associative map from `Variable` to `Value` representing the memory
type Store = Map Variable Value
--------- Parser ---------
-- The Lexer
lexer = P.makeTokenParser (emptyDef {
P.identStart = letter,
P.identLetter = alphaNum,
P.reservedOpNames = ["+", "-", "*", "!", ">", "=", "==", "<"],
P.reservedNames = ["true", "false", "if", "in", "then", "else", "while", "end", "to", "do", "for"]
})
-- The Parser
-- Number literals
numberParser :: Parser Value
numberParser = (IntVal . fromIntegral) <$> P.natural lexer
-- Boolean literals
boolParser :: Parser Value
boolParser = (P.reserved lexer "true" >> return (BoolVal True))
<|> (P.reserved lexer "false" >> return (BoolVal False))
-- Literals and Variables
valueParser :: Parser Expression
valueParser = Val <$> (numberParser <|> boolParser)
<|> Var <$> P.identifier lexer
-- -- Expressions
exprParser :: Parser Expression
exprParser = liftA2 Assignment
(try (P.identifier lexer >>= (\v ->
P.reservedOp lexer "=" >> return v)))
exprParser
<|> buildExpressionParser table valueParser
where table = [[Infix (op "*" (BinOp Times)) AssocLeft]
,[Infix (op "+" (BinOp Plus)) AssocLeft]
,[Infix (op "-" (BinOp Minus)) AssocLeft]
,[Infix (op ">" (BinOp GreaterThan)) AssocLeft]
,[Infix (op "==" (BinOp Equals)) AssocLeft]
,[Infix (op "<" (BinOp LessThan)) AssocLeft]]
op name node = (P.reservedOp lexer name) >> return node
-- Sequence of statements
stmtParser :: Parser Statement
stmtParser = stmtParser1 `chainl1` (P.semi lexer >> return Sequence)
-- Single statements
stmtParser1 :: Parser Statement
stmtParser1 = (Expr <$> exprParser)
<|> do
P.reserved lexer "if"
cond <- exprParser
P.reserved lexer "then"
the <- stmtParser
P.reserved lexer "else"
els <- stmtParser
P.reserved lexer "end"
return (If cond the els)
<|> do
P.reserved lexer "while"
cond <- exprParser
P.reserved lexer "do"
body <- stmtParser
P.reserved lexer "end"
return (While cond body)
-------- Helper functions --------
-- Lift primitive operations on IntVal and BoolVal values
liftIII :: (Int -> Int -> Int) -> Value -> Value -> Value
liftIII f (IntVal x) (IntVal y) = IntVal $ f x y
liftIIB :: (Int -> Int -> Bool) -> Value -> Value -> Value
liftIIB f (IntVal x) (IntVal y) = BoolVal $ f x y
-- Apply the correct primitive operator for the given Op value
applyOp :: Op -> Value -> Value -> Value
applyOp Plus = liftIII (+)
applyOp Minus = liftIII (-)
applyOp Times = liftIII (*)
applyOp GreaterThan = liftIIB (>)
applyOp Equals = liftIIB (==)
applyOp LessThan = liftIIB (<)
-- Parse and print (pp) the given WHILE programs
pp :: String -> IO ()
pp input = case (parse stmtParser "" input) of
Left err -> print err
Right x -> print x
-- Parse and run the given WHILE programs
run :: (Show v) => (Parser n) -> String -> (n -> Store -> v) -> IO ()
run parser input eval = case (parse parser "" input) of
Left err -> print err
Right x -> print (eval x empty)

It's a little difficult to answer your question, because you didn't actually ask one. Let me just pick out a few of the things that you've said, in order to give you a few clues.
I am not sure if I need to use evalE in this problem or not. I have written it in a previous problem. The signature for evalE is evalE :: Expression -> Store -> (Value, Store)
evalS (Expr e) s = ... Not sure what to do, here.
What does it mean to execute a statement which consists of an expression? If it has something to do with evaluating the expression, then the fact that you have an expression evaluator available might help with "what to do, here".
Next, compare the code you've been given for "while" (which contains a fine example of a sensible thing to do with an expression, by the way)...
evalS w#(While e s1) s = case (evalE e s) of`
(BoolVal True,s') -> let s'' = evalS s1 s' in evalS w s''
(BoolVal False,s') -> s'
_ -> error "Condition must be a BoolVal"
...and compare it with your code for "if"
evalS (If e s1 s2) s = do
x <- evalE e
case x of
BoolVal True -> evalS s1
BoolVal False -> evalS s2
Your code is in a rather different style — the "monadic" style. Where are you getting that from? It would make sense if the types of the evaluators were something like
evalE :: Expression -> State Store Value
evalS :: Statement -> State Store ()
The monadic style is a very nice way to thread the mutating store through the evaluation process without talking about it too much. E.g., your x <- evalE e means "let x be the result of evaluating e (quietly receiving the initial store and passing along the resulting store)". It's a good way to work which I expect you'll explore in due course.
But those aren't the types you've been given, and the monadic style is not appropriate. You have
evalE :: Expression -> Store -> (Value, Store)
evalS :: Statement -> Store -> Store
and the example code threads the store explicitly. Look again
evalS w#(While e s1) s = case (evalE e s) of`
(BoolVal True,s') -> let s'' = evalS s1 s' in evalS w s''
(BoolVal False,s') -> s'
_ -> error "Condition must be a BoolVal"
See? evalS receives its initial store, s, explicitly, and uses it explicitly in evalE e s. The resulting new store is called s' in both case branches. If the loop is over, then s' is given back as the final store. Otherwise, s' is used as the store for one pass through the loop body, s1, and the store s'' resulting from that is used for the next time around the loop, w.
Your code will need to be similarly explicit in the way it names and uses the store at each stage of evaluation. Let's walk through.
evalS Skip s = show s -- I am assuming that since Skip returns an empty String, I just need to return an empty String.
You assume incorrectly. The evalS function does not return a String, empty or otherwise: it returns a Store. Now, which Store? Your initial store is s: how will the store after "skip" relate to s?
evalS (Sequence s1 s2) s = evalS s1 >> evalS s2 -- sequence1 then sequence2. I am not quite sure what to do with the s.
Again, that's a monadic approach which does not fit with this question. You need to thread the store, intially s, through the process of evaluating statements s1 and s2 in sequence. The "while" case has a good example of how to do something like that.
evalS (Expr e) s = ... Not sure what to do, here.
Again, the "while" example shows you one way to extract a value and an updated store by evaluating an expression. Food for thought, isn't it?
evalS (If e s1 s2) s = ...
Now "if" starts out by evaluating a condition, rather a lot like "while", no?
So, my advice amounts to this:
drop the monadic style code for now, but come back to it later when it's appropriate;
read the example implementation of "while" and understand how it treats expressions and sequences of statements, passing the store explicitly;
deploy the similar techniques to implement the other constructs.
The person who set the question has been kind enough to give you code which gives an example of everything you will need. Please reciprocate that kindness by comprehending and then taking the hint!

Since this looks as homework I'll just provide a few small hints, leaving the real work for you.
I am not sure if I need to use evalE in this problem or not.
Yes, you'll have to. In your language, an expression e modifies the store and returns a value: you can tell that from evalE returning a pair (Value,Store)
By comparison, the statement Expr e modifies the store without returning a value. To obtain the latter (statement evaluation) from the former (expression) all you need to do is to throw away what you do not need.
About your attempt:
evalS Skip s = show s -- I am assuming that since Skip returns an empty String, I just need to return an empty String.
Why a string? Does evalS return strings? If not, what it returns? You are doing far more work than you have to, here.
evalS (Sequence s1 s2) s = evalS s1 >> evalS s2 -- sequence1 then sequence2. I am not quite sure what to do with the s.
OK, the idea is right but the code is not. Forget about monads and >>, just think about the stores. You make two recursive calls evalS s1 and evalS s2: these look wrong because evalS expects two arguments (statement and store), and you only provide one.
And -- before you try it -- no, passing s to both of them would still be wrong. In which store is the first statement evaluated? What about the second?
evalS (Expr e) s = ... Not sure what to do, here.
See the discussion above.
evalS (If e s1 s2) s = do
x <- evalE e
case x of
BoolVal True -> evalS s1
BoolVal False -> evalS s2
Avoid monad-related operations, do and <-. There might be a way to use those to solve this task, but I'd not recommend to try that path to a beginner.
You can use let if you want to name intermediate results.
evalE takes two arguments, not one. Mind that it returns a pair, not a value. evalS takes two arguments.

Related

Simulating non-deterministic choice through the List Monad

I'm trying to write an evaluation function for a language that I am working on in which non-determinism can be permitted within an if-block, called a selection block. What I'm trying to achieve is the ability to pick an if/selection statement from the block whose guard is true and evaluate it but it doesn't matter which one I pick.
From searching, I found an example that performs in a similar way to what I would like to achieve through modelling coinflips. Below is my adapation of it but I'm having issue in applying this logic to my problem.
import Control.Monad
data BranchType = Valid | Invalid deriving (Show)
data Branch = If (Bool, Integer) deriving (Show, Eq)
f Valid = [If (True, 1)]
f Invalid = [If (False, 0)]
pick = [Invalid, Invalid, Valid, Invalid, Valid]
experiment = do
b <- pick
r <- f b
guard $ fstB r
return r
s = take 1 experiment
fstB :: Branch -> Bool
fstB (If (cond, int)) = cond
main :: IO ()
main = putStrLn $ show $ s -- shows first branch which could be taken.
Below is my ADT and what I have been trying to make work:
data HStatement
= Eval HVal
| Print HVal
| Skip String
| Do HVal [HStatement]
| If (HVal, [HStatement])
| IfBlock [HStatement] -- made up of many If
| Select [HStatement] -- made up of many If
deriving (Eq, Read)
fstIf :: HStatement -> Bool
fstIf (If (cond, body)) = if hval2bool cond == True
then True
else False
h :: Env -> HStatement -> IOThrowsError ()
h env sb = do
x <- g env sb
guard $ fstIf x -- Couldn't match expected type ‘HStatement’ with actual type ‘[HStatement]’
-- after guard, take 1 x then evaluate
g :: Env -> HStatement -> IOThrowsError [HStatement]
g env (Select sb) = mapM (\x -> f env x) sb
f :: Env -> HStatement -> IOThrowsError HStatement
f env (If (cond, body)) = evalHVal env cond >>= \x -> case x of
Bool True -> return $ If (Bool True, body)
Bool False -> return $ If (Bool False, body)
The error I receive is the following : Couldn't match expected type ‘HStatement’ with actual type ‘[HStatement]’ at the guard line. I believe the reason as to why the first section of code was successful was because the values were being drawn from List but in the second case although they're being drawn from a list, they're being drawn from a [HStatement], not something that just represents a list...if that makes any sort of sense, I feel like I'm missing the vocabulary.
In essence then what should occur is given a selection block of n statement, a subset of these are produced whose guards are true and only one statement is taken from it.
The error message is pretty clear now that you have some types written down. g returns IOThrowsError [HStatement], so when you bind its result to x in h, you have an [HStatement]. You then call fstIf, which expects a single HStatement, not a list. You need to decide how to handle the multiple results from g.

Monadic Parser - handling string with one character

I was reading this Monadic Parsing article while I was trying to implement a pretty simple string parser in Haskell and also get a better understanding of using monads. Down below you can see my code, implementing functions for matching a single character or a whole string. It works as expected, but I observed two strange behaviors that I can't explain.
I have to handle single characters in string, otherwise, the parser will return only empty lists. To be exact, if I remove this line string [c] = do char c; return [c] it won't work anymore. I was expecting that string (c:s) would handle string (c:[]) properly. What could be the cause here?
In my opinion, string definition should be equivalent to string s = mapM char s as it would create a list of [Parser Char] for each character in s and collect the results as Parser [Char]. If I use the definition based on mapM, the program would get stuck in an infinite loop and won't print anything. Is something about lazy evalutation that I miss here?
.
module Main where
newtype Parser a = Parser { apply :: String->[(a, String)] }
instance Monad Parser where
return a = Parser $ \s -> [(a, s)]
ma >>= k = Parser $ \s -> concat [apply (k a) s' | (a, s') <- apply ma s]
instance Applicative Parser where
pure = return
mf <*> ma = do { f <- mf; f <$> ma; }
instance Functor Parser where
fmap f ma = f <$> ma
empty :: Parser a
empty = Parser $ const []
anychar :: Parser Char
anychar = Parser f where
f [] = []
f (c:s) = [(c, s)]
satisfy :: (Char -> Bool) -> Parser Char
satisfy prop = do
c <- anychar
if prop c then return c
else empty
char :: Char -> Parser Char
char c = satisfy (== c)
string :: String -> Parser String
string [] = empty
string [c] = do char c; return [c] --- if I remove this line, all results will be []
string (c:s) = do char c; string s; return (c:s)
main = do
let s = "12345"
print $ apply (string "123") s
print $ apply (string "12") s
print $ apply (string "1") s
print $ apply (string []) s
PS. I think the title of the question is not suggestive enough, please propose an edit if you have a better idea.
Since you did string [] = empty instead of string [] = return [], you can't use it as a base case for recursion that builds up a list.
fmap f ma = f <$> ma is wrong, since <$> is defined in terms of fmap. If you want to define fmap in terms of your other instances, then do fmap = liftA or fmap = liftM. Since mapM uses fmap internally but your original string didn't, this problem didn't come up in your first simple test.
string [] = empty
means: "If you need to parse an empty string, fail -- it can not be parsed at all, no matter what's the input string".
By comparison,
string [] = return ""
means: "If you need to parse an empty string, succeed and return the empty string -- it can always be parsed, no matter what's the input string".
By using the first equation, when you recurse in the case string (c:cs) you need to stop at one character (string [c]) since reaching zero characters will run empty and make the whole parser fail.
Hence, you need to either use that string [c] = return [c] equation, or modify the base "empty string" case so that it succeeds. Arguably, the latter would be more natural.

How far does "try" back track?

So ... I messed up a recording in CSV format:
23,95489,0,20,9888
Due to language settings floating point numbers were written with commas as seperator ... in a comma separated value file ...
Problem is that the file does not have a nice formatting for every float. Some have no point at all and the number of numbers behind the point varies too.
My idea was to build a MegaParsec parser that would try to read every possible floating point formatting, move on and if back track if it finds an error.
Eg for the example above:
read 23,95489 -> good
read 0,20 -> good (so far)
read 9888 -> error (because value is too high for column (checked by guard))
(back tracking to 2.) read 0 -> good again
read 20,9888 -> good
done
I've implemented that as (pseudo code here):
floatP = try pointyFloatP <|> unpointyFloatP
lineP = (,,) <$> floatP <* comma <*> floatP <* comma <*> floatP <* comma
My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?
And if so ... how would I go about implementing further back tracking?
How far does “try” back track?
The parser try p consumes exactly as much input as p if p parses successfully, otherwise it does not consume any input at all. So if you look at that in terms of backtracking, it backtracks to the point where you were when you invoked it.
My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?
Yes, try does not "unconsume" input. All it does is to recover from a failure in the parser you give it without consuming any input. It does not undo the effects of any parsers that you've applied previously, nor does it affect subsequent parsers that you apply after try p succeeded.
And if so ... how would I go about implementing further back tracking?
Basically what you want is to not only know whether pointyFloatP succeeds on the current input, but also whether the rest of your lineP would succeed after successfully pointyFloatP - and if it doesn't you want to backtrack back to before you applied pointyFloatP. So basically you want the parser for the whole remaining line in the try, not just the float parser.
To achieve that you can make floatP take the parser for the remaining line as an argument like this:
floatP restP = try (pointyFloatP <*> restP) <|> unpointyFloatP <*> restP
Note that this kind of backtracking isn't going to be very efficient (but I assume you knew that going in).
Update: Include a custom monadic parser for more complex rows.
Using the List Monad for Simple Parsing
The list monad makes a better backtracking "parser" than Megaparsec. For example, to parse the cells:
row :: [String]
row = ["23", "95489", "0", "20", "9888"]
into exactly three columns of values satisfying a particular bound (e.g., less than 30), you can generate all possible parses with:
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Control.Applicative
rowResults :: [String] -> [[Double]]
rowResults = cols 3
where cols :: Int -> [String] -> [[Double]]
cols 0 [] = pure [] -- good, finished on time
cols 0 _ = empty -- bad, didn't use all the data
-- otherwise, parse exactly #n# columns from cells #xs#
cols n xs = do
-- form #d# from one or two cells
(d, ys) <- num1 xs <|> num2 xs
-- only accept #d < 30#
guard $ d < 30
ds <- cols (n-1) ys
return $ d : ds
-- read number from a single cell
num1 (x:xs) | ok1 x = pure (read x, xs)
num1 _ = empty
-- read number from two cells
num2 (x:y:zs) | ok1 x && ok2 y = pure (read (x ++ "." ++ y), zs)
num2 _ = empty
-- first cell: "0" is okay, but otherwise can't start with "0"
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- second cell: can't end with "0" (or *be* "0")
ok2 xs = last xs /= '0'
The above list-based parser tries to reduce ambiguity by assuming that if "xxx,yyy" is a number, the "xxx" won't start with zeros (unless it's just "0"), and the "yyy" won't end with a zero (or, for that matter, be a single "0"). If this isn't right, just modify ok1 and ok2 as appropriate.
Applied to row, this gives the single unambiguous parse:
> rowResults row
[[23.95489,0.0,20.9888]]
Applied to an ambiguous row, it gives all parses:
> rowResults ["0", "12", "5", "0", "8601"]
[[0.0,12.5,0.8601],[0.0,12.5,0.8601],[0.12,5.0,0.8601]]
Anyway, I'd suggest using a standard CSV parser to parse your file into a matrix of String cells like so:
dat :: [[String]]
dat = [ ["23", "95489", "0", "20", "9888"]
, ["0", "12", "5", "0", "8601"]
, ["23", "2611", "2", "233", "14", "422"]
]
and then use rowResults above get the row numbers of rows that were ambiguous:
> map fst . filter ((>1) . snd) . zip [1..] . map (length . rowResults) $ dat
[2]
>
or unparsable:
> map fst . filter ((==0) . snd) . zip [1..] . map (length . rowResults) $ dat
[]
>
Assuming there are no unparsable rows, you can regenerate one possible fixed file, even if some rows are ambiguous, but just grabbing the first successful parse for each row:
> putStr $ unlines . map (intercalate "," . map show . head . rowResults) $ dat
23.95489,0.0,20.9888
0.0,12.5,0.8601
23.2611,2.233,14.422
>
Using a Custom Monad based on the List Monad for More Complex Parsing
For more complex parsing, for example if you wanted to parse a row like:
type Stream = [String]
row0 :: Stream
row0 = ["Apple", "15", "1", "5016", "2", "5", "3", "1801", "11/13/2018", "X101"]
with a mixture of strings and numbers, it's actually not that difficult to write a monadic parser, based on the list monad, that generates all possible parses.
The key idea is to define a parser as a function that takes a stream and generates a list of possible parses, with each possible parse represented as a tuple of the object successfully parsed from the beginning of the stream paired with the remainder of the stream. Wrapped in a newtype, our parallel parser would look like:
newtype PParser a = PParser (Stream -> [(a, Stream)]) deriving (Functor)
Note the similarity to the type ReadS from Text.ParserCombinators.ReadP, which is also technically an "all possible parses" parser (though you usually only expect one, unambiguous parse back from a reads call):
type ReadS a = String -> [(a, String)]
Anyway, we can define a Monad instance for PParser like so:
instance Applicative PParser where
pure x = PParser (\s -> [(x, s)])
(<*>) = ap
instance Monad PParser where
PParser p >>= f = PParser $ \s1 -> do -- in list monad
(x, s2) <- p s1
let PParser q = f x
(y, s3) <- q s2
return (y, s3)
There's nothing too tricky here: pure x returns a single possible parse, namely the result x with an unchanged stream s, while p >>= f applies the first parser p to generate a list of possible parses, takes them one by one within the list monad to calculate the next parser q to use (which, as per usual for a monadic operation, can depend on the result of the first parse), and generates a list of possible final parses that are returned.
The Alternative and MonadPlus instances are pretty straightforward -- they just lift emptiness and alternation from the list monad:
instance Alternative PParser where
empty = PParser (const empty)
PParser p <|> PParser q = PParser $ \s -> p s <|> q s
instance MonadPlus PParser where
To run our parser, we have:
parse :: PParser a -> Stream -> [a]
parse (PParser p) s = map fst (p s)
and now we can introduce primitives:
-- read a token as-is
token :: PParser String
token = PParser $ \s -> case s of
(x:xs) -> pure (x, xs)
_ -> empty
-- require an end of stream
eof :: PParser ()
eof = PParser $ \s -> case s of
[] -> pure ((), s)
_ -> empty
and combinators:
-- combinator to convert a String to any readable type
convert :: (Read a) => PParser String -> PParser a
convert (PParser p) = PParser $ \s1 -> do
(x, s2) <- p s1 -- for each possible String
(y, "") <- reads x -- get each possible full read
-- (normally only one)
return (y, s2)
and parsers for various "terms" in our CSV row:
-- read a string from a single cell
str :: PParser String
str = token
-- read an integer (any size) from a single cell
int :: PParser Int
int = convert (mfilter ok1 token)
-- read a double from one or two cells
dbl :: PParser Double
dbl = dbl1 <|> dbl2
where dbl1 = convert (mfilter ok1 token)
dbl2 = convert $ do
t1 <- mfilter ok1 token
t2 <- mfilter ok2 token
return $ t1 ++ "." ++ t2
-- read a double that's < 30
dbl30 :: PParser Double
dbl30 = do
x <- dbl
guard $ x < 30
return x
-- rules for first cell of numbers:
-- "0" is okay, but otherwise can't start with "0"
ok1 :: String -> Bool
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- rules for second cell of numbers:
-- can't be "0" or end in "0"
ok2 :: String -> Bool
ok2 xs = last xs /= '0'
Then, for a particular row schema, we can write a row parser as we normally would with a monadic parser:
-- a row
data Row = Row String Int Double Double Double
Int String String deriving (Show)
rowResults :: PParser Row
rowResults = Row <$> str <*> int <*> dbl30 <*> dbl30 <*> dbl30
<*> int <*> str <*> str <* eof
and get all possible parses:
> parse rowResults row0
[Row "Apple" 15 1.5016 2.0 5.3 1801 "11/13/2018" "X101"
,Row "Apple" 15 1.5016 2.5 3.0 1801 "11/13/2018" "X101"]
>
The full program is:
{-# LANGUAGE DeriveFunctor #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Control.Applicative
type Stream = [String]
newtype PParser a = PParser (Stream -> [(a, Stream)]) deriving (Functor)
instance Applicative PParser where
pure x = PParser (\s -> [(x, s)])
(<*>) = ap
instance Monad PParser where
PParser p >>= f = PParser $ \s1 -> do -- in list monad
(x, s2) <- p s1
let PParser q = f x
(y, s3) <- q s2
return (y, s3)
instance Alternative PParser where
empty = PParser (const empty)
PParser p <|> PParser q = PParser $ \s -> p s <|> q s
instance MonadPlus PParser where
parse :: PParser a -> Stream -> [a]
parse (PParser p) s = map fst (p s)
-- read a token as-is
token :: PParser String
token = PParser $ \s -> case s of
(x:xs) -> pure (x, xs)
_ -> empty
-- require an end of stream
eof :: PParser ()
eof = PParser $ \s -> case s of
[] -> pure ((), s)
_ -> empty
-- combinator to convert a String to any readable type
convert :: (Read a) => PParser String -> PParser a
convert (PParser p) = PParser $ \s1 -> do
(x, s2) <- p s1 -- for each possible String
(y, "") <- reads x -- get each possible full read
-- (normally only one)
return (y, s2)
-- read a string from a single cell
str :: PParser String
str = token
-- read an integer (any size) from a single cell
int :: PParser Int
int = convert (mfilter ok1 token)
-- read a double from one or two cells
dbl :: PParser Double
dbl = dbl1 <|> dbl2
where dbl1 = convert (mfilter ok1 token)
dbl2 = convert $ do
t1 <- mfilter ok1 token
t2 <- mfilter ok2 token
return $ t1 ++ "." ++ t2
-- read a double that's < 30
dbl30 :: PParser Double
dbl30 = do
x <- dbl
guard $ x < 30
return x
-- rules for first cell of numbers:
-- "0" is okay, but otherwise can't start with "0"
ok1 :: String -> Bool
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- rules for second cell of numbers:
-- can't be "0" or end in "0"
ok2 :: String -> Bool
ok2 xs = last xs /= '0'
-- a row
data Row = Row String Int Double Double Double
Int String String deriving (Show)
rowResults :: PParser Row
rowResults = Row <$> str <*> int <*> dbl30 <*> dbl30 <*> dbl30
<*> int <*> str <*> str <* eof
row0 :: Stream
row0 = ["Apple", "15", "1", "5016", "2", "5", "3", "1801", "11/13/2018", "X101"]
main = print $ parse rowResults row0
Off-the-shelf Solutions
I find it a little surprising I can't find an existing parser library out there that provides this kind of "all possible parses" parser. The stuff in Text.ParserCombinators.ReadP takes the right approach, but it assumes that you're parsing characters from a String rather than arbitrary tokens from some other stream (in our case, Strings from a [String]).
Maybe someone else can point out an off-the-shelf solution that would save you from having to role your own parser type, instances, and primitives.

Is this syntax as expressive as the do-notation?

The do notation allows us to express monadic code without overwhelming nestings, so that
main = getLine >>= \ a ->
getLine >>= \ b ->
putStrLn (a ++ b)
can be expressed as
main = do
a <- getLine
b <- getLine
putStrLn (a ++ b)
Suppose, though, the syntax allows ... #expression ... to stand for do { x <- expression; return (... x ...) }. For example, foo = f a #(b 1) c would be desugared as: foo = do { x <- b 1; return (f a x c) }. The code above could, then, be expressed as:
main = let a = #getLine in
let b = #getLine in
putStrLn (a ++ b)
Which would be desugared as:
main = do
x <- getLine
let a = x in
return (do
x' <- getLine
let b = x' in
return (putStrLn (a ++ b)))
That is equivalent. This syntax is appealing to me because it seems to offer the same functionality as the do-notation, while also allowing some shorter expressions such as:
main = putStrLn (#(getLine) ++ #(getLine))
So, I wonder if there is anything defective with this proposed syntax, or if it is indeed complete and equivalent to the do-notation.
putStrLn is already String -> IO (), so your desugaring ... return (... return (putStrLn (a ++ b))) ends up having type IO (IO (IO ())), which is likely not what you wanted: running this program won't print anything!
Speaking more generally, your notation can't express any do-block which doesn't end in return. [See Derek Elkins' comment.]
I don't believe your notation can express join, which can be expressed with do without any additional functions:
join :: Monad m => m (m a) -> m a
join mx = do { x <- mx; x }
However, you can express fmap constrained to Monad:
fmap' :: Monad m => (a -> b) -> m a -> m b
fmap' f mx = f #mx
and >>= (and thus everything else) can be expressed using fmap' and join. So adding join would make your notation complete, but still not convenient in many cases, because you end up needing a lot of joins.
However, if you drop return from the translation, you get something quite similar to Idris' bang notation:
In many cases, using do-notation can make programs unnecessarily verbose, particularly in cases such as m_add above where the value bound is used once, immediately. In these cases, we can use a shorthand version, as follows:
m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = pure (!x + !y)
The notation !expr means that the expression expr should be evaluated and then implicitly bound. Conceptually, we can think of ! as being a prefix function with the following type:
(!) : m a -> a
Note, however, that it is not really a function, merely syntax! In practice, a subexpression !expr will lift expr as high as possible within its current scope, bind it to a fresh name x, and replace !expr with x. Expressions are lifted depth first, left to right. In practice, !-notation allows us to program in a more direct style, while still giving a notational clue as to which expressions are monadic.
For example, the expression:
let y = 42 in f !(g !(print y) !x)
is lifted to:
let y = 42 in do y' <- print y
x' <- x
g' <- g y' x'
f g'
Adding it to GHC was discussed, but rejected (so far). Unfortunately, I can't find the threads discussing it.
How about this:
do a <- something
b <- somethingElse a
somethingFinal a b

Implementing a language interpreter in Haskell

I want to implement an imperative language interpreter in Haskell (for educational purposes). But it's difficult for me to create right architecture for my interpreter: How should I store variables? How can I implement nested function calls? How should I implement variable scoping? How can I add debugging possibilities in my language? Should I use monads/monad transformers/other techniques? etc.
Does anybody know good articles/papers/tutorials/sources on this subject?
If you are new to writing this kind of processors, I would recommend to put off using monads for a while and first focus on getting a barebones implementation without any bells or whistles.
The following may serve as a minitutorial.
I assume that you have already tackled the issue of parsing the source text of the programs you want to write an interpreter for and that you have some types for capturing the abstract syntax of your language. The language that I use here is very simple and only consists of integer expressions and some basic statements.
Preliminaries
Let us first import some modules that we will use in just a bit.
import Data.Function
import Data.List
The essence of an imperative language is that it has some form of mutable variables. Here, variables simply represented by strings:
type Var = String
Expressions
Next, we define expressions. Expressions are constructed from integer constants, variable references, and arithmetic operations.
infixl 6 :+:, :-:
infixl 7 :*:, :/:
data Exp
= C Int -- constant
| V Var -- variable
| Exp :+: Exp -- addition
| Exp :-: Exp -- subtraction
| Exp :*: Exp -- multiplication
| Exp :/: Exp -- division
For example, the expression that adds the constant 2 to the variable x is represented by V "x" :+: C 2.
Statements
The statement language is rather minimal. We have three forms of statements: variable assignments, while loops, and sequences.
infix 1 :=
data Stmt
= Var := Exp -- assignment
| While Exp Stmt -- loop
| Seq [Stmt] -- sequence
For example, a sequence of statements for "swapping" the values of the variables x and y can be represented by Seq ["tmp" := V "x", "x" := V "y", "y" := V "tmp"].
Programs
A program is just a statement.
type Prog = Stmt
Stores
Now, let us move to the actual interpreter. While running a program, we need to keep track of the values assigned to the different variables in the programs. These values are just integers and as a representation of our "memory" we just use lists of pairs consisting of a variable and a value.
type Val = Int
type Store = [(Var, Val)]
Evaluating expressions
Expressions are evaluated by mapping constants to their value, looking up the values of variables in the store, and mapping arithmetic operations to their Haskell counterparts.
eval :: Exp -> Store -> Val
eval (C n) r = n
eval (V x) r = case lookup x r of
Nothing -> error ("unbound variable `" ++ x ++ "'")
Just v -> v
eval (e1 :+: e2) r = eval e1 r + eval e2 r
eval (e1 :-: e2) r = eval e1 r - eval e2 r
eval (e1 :*: e2) r = eval e1 r * eval e2 r
eval (e1 :/: e2) r = eval e1 r `div` eval e2 r
Note that if the store contains multiple bindings for a variable, lookup selects the bindings that comes first in the store.
Executing statements
While the evaluation of an expression cannot alter the contents of the store, executing a statement may in fact result in an update of the store. Hence, the function for executing a statement takes a store as an argument and produces a possibly updated store.
exec :: Stmt -> Store -> Store
exec (x := e) r = (x, eval e r) : r
exec (While e s) r | eval e r /= 0 = exec (Seq [s, While e s]) r
| otherwise = r
exec (Seq []) r = r
exec (Seq (s : ss)) r = exec (Seq ss) (exec s r)
Note that, in the case of assignments, we simply push a new binding for the updated variable to the store, effectively shadowing any previous bindings for that variable.
Top-level Interpreter
Running a program reduces to executing its top-level statement in the context of an initial store.
run :: Prog -> Store -> Store
run p r = nubBy ((==) `on` fst) (exec p r)
After executing the statement we clean up any shadowed bindings, so that we can easily read off the contents of the final store.
Example
As an example, consider the following program that computes the Fibonacci number of the number stored in the variable n and stores its result in the variable x.
fib :: Prog
fib = Seq
[ "x" := C 0
, "y" := C 1
, While (V "n") $ Seq
[ "z" := V "x" :+: V "y"
, "x" := V "y"
, "y" := V "z"
, "n" := V "n" :-: C 1
]
]
For instance, in an interactive environment, we can now use our interpreter to compute the 25th Fibonacci number:
> lookup "x" $ run fib [("n", 25)]
Just 75025
Monadic Interpretation
Of course, here, we are dealing with a very simple and tiny imperative language. As your language gets more complex, so will the implementation of your interpreter. Think for example about what additions you need when you add procedures and need to distinguish between local (stack-based) storage and global (heap-based) storage. Returning to that part of your question, you may then indeed consider the introduction of monads to streamline the implementation of your interpreter a bit.
In the example interpreter above, there are two "effects" that are candidates for being captured by a monadic structure:
The passing around and updating of the store.
Aborting running the program when a run-time error is encountered. (In the implementation above, the interpreter simply crashes when such an error occurs.)
The first effect is typically captured by a state monad, the second by an error monad. Let us briefly investigate how to do this for our interpreter.
We prepare by importing just one more module from the standard libraries.
import Control.Monad
We can use monad transformers to construct a composite monad for our two effects by combining a basic state monad and a basic error monad. Here, however, we simply construct the composite monad in one go.
newtype Interp a = Interp { runInterp :: Store -> Either String (a, Store) }
instance Monad Interp where
return x = Interp $ \r -> Right (x, r)
i >>= k = Interp $ \r -> case runInterp i r of
Left msg -> Left msg
Right (x, r') -> runInterp (k x) r'
fail msg = Interp $ \_ -> Left msg
Edit 2018: The Applicative Monad Proposal
Since the Applicative Monad Proposal (AMP) every Monad must also be an instance of Functor and Applicative. To do this we can add
import Control.Applicative -- Otherwise you can't do the Applicative instance.
to the imports and make Interp an instance of Functor and Applicative like this
instance Functor Interp where
fmap = liftM -- imported from Control.Monad
instance Applicative Interp where
pure = return
(<*>) = ap -- imported from Control.Monad
Edit 2018 end
For reading from and writing to the store, we introduce effectful functions rd and wr:
rd :: Var -> Interp Val
rd x = Interp $ \r -> case lookup x r of
Nothing -> Left ("unbound variable `" ++ x ++ "'")
Just v -> Right (v, r)
wr :: Var -> Val -> Interp ()
wr x v = Interp $ \r -> Right ((), (x, v) : r)
Note that rd produces a Left-wrapped error message if a variable lookup fails.
The monadic version of the expression evaluator now reads
eval :: Exp -> Interp Val
eval (C n) = do return n
eval (V x) = do rd x
eval (e1 :+: e2) = do v1 <- eval e1
v2 <- eval e2
return (v1 + v2)
eval (e1 :-: e2) = do v1 <- eval e1
v2 <- eval e2
return (v1 - v2)
eval (e1 :*: e2) = do v1 <- eval e1
v2 <- eval e2
return (v1 * v2)
eval (e1 :/: e2) = do v1 <- eval e1
v2 <- eval e2
if v2 == 0
then fail "division by zero"
else return (v1 `div` v2)
In the case for :/:, division by zero results in an error message being produced through the Monad-method fail, which, for Interp, reduces to wrapping the message in a Left-value.
For the execution of statements we have
exec :: Stmt -> Interp ()
exec (x := e) = do v <- eval e
wr x v
exec (While e s) = do v <- eval e
when (v /= 0) (exec (Seq [s, While e s]))
exec (Seq []) = do return ()
exec (Seq (s : ss)) = do exec s
exec (Seq ss)
The type of exec conveys that statements do not result in values but are executed only for their effects on the store or the run-time errors they may trigger.
Finally, in the function run we perform a monadic computation and process its effects.
run :: Prog -> Store -> Either String Store
run p r = case runInterp (exec p) r of
Left msg -> Left msg
Right (_, r') -> Right (nubBy ((==) `on` fst) r')
In the interactive environment, we can now revisit the interpretation of our example program:
> lookup "x" `fmap` run fib [("n", 25)]
Right (Just 75025)
> lookup "x" `fmap` run fib []
Left "unbound variable `n'"
A couple of good papers I've finally found:
Building Interpreters by Composing Monads
Monad Transformers Step by Step - how incrementally build tiny interpreter using
monad transformers
How to build a monadic interpreter in one day
Monad Transformers and Modular Interpreters

Resources