I'm trying to implement an interpreter for a programming language with lazy-binding in Haskell.
I'm using the tying-the-knot pattern to implement the evaluation of expressions. However I found it extremely hard to debug and to reason about. I spent at least 40 working on this. I learned a lot about laziness and tying-the-knot, but I haven't reached a solution yet and some behaviors still puzzle me.
Questions
Is there a sensible way to debug the knot and figure out what causes it bottom?
GHC stacktrace (printed when using profiling options) shows which function inside the knot triggers a loop. But that's not helpful: I need to understand what makes the knot strict in the knot's definition, and I couldn't find a way to show this.
It's been really hard to understand why the knot bottoms and I don't think it will be much easier, the next times I have to debug something like this.
How should I tie the knot in a monadic context? I learned that a function like traverse is strict for most types and this causes the knot to bottom.
The only solution I can think of, is to remove the knot. That would increase the problem's complexity (every value would need to be re-computed every time), although this issue can be resolved by caching the value in a STRef: that's exactly what I would do in a strict language. I would prefer to avoid this solution and take advantage of Haskell's laziness, otherwise what's the point of it?
In the code I provide later in this post, why does evalSt e1 terminate, while evalSt e2 doesn't? I can't still understand what's the difference.
Language's AST
I tried to simplify my AST as much as possible, and this is the most minimal definition I could come up with:
data Expr = Int Int | Negate Expr | Id String | Obj (M.Map String Expr)
deriving (Eq, Ord, Show)
pprint :: Expr -> String
pprint e = case e of
Int i -> show i
Negate i -> "(-" ++ pprint i ++ ")"
Id i -> i
Obj obj -> "{" ++ intercalate ", "
[ k ++ ":" ++ pprint v | (k,v) <- M.toList obj ] ++ "}"
Example programs
Here are a couple of example expressions represented with the AST above:
-- expression: {a:{aa1:(-b), aa2:ab, ab:(-b)}, b:3}
-- should evalutae to: {a:{aa1:-3, aa2:-3, ab:-3 }, b:3}
e1 = Obj $ M.fromList [
("a", Obj $ M.fromList [
("aa1", Negate $ Id "b"),
("aa2", Id "ab"),
("ab", Negate $ Id "b")
]),
("b", Int 3)
]
-- expression: {a:{aa:(-ab), ab:b}, b:3}
-- should evaluate to: {a:{aa:-3, ab:3}, b:3}
e2 = Obj $ M.fromList [
("a", Obj $ M.fromList [
("aa", Negate $ Id "ab"),
("ab", Id "b")
]),
("b", Int 3)
]
Pure eval function
I have then defined a function to evaluate an expression. This is the most simple definition I could write:
type Scope = M.Map String Expr
eval :: Scope -> Expr -> Expr
eval scope expr = case expr of
Int i -> Int i
Id str -> case M.lookup str scope of
Just e -> e
Nothing -> error $ str ++ " not in scope"
Negate aE -> do
case (eval scope aE) of
Int i -> Int $ -i
_ -> error $ "Can only negate ints. Found: " ++ pprint aE
Obj kvMap -> Obj $
let resMap = fmap (eval (M.union resMap scope)) kvMap
in resMap
Tying the knot
The most interesting part in the eval function is the tying the knot in the Obj kvMap case:
let resMap = fmap (eval (M.union resMap scope)) kvMap
in resMap
The idea is that in order to compute the expressions in kvMap, the identifiers need to be able to access both the values in scope and the results of the expressions in kvMap. The computed values are resMap, and to compute them we use the scope resMap ⋃ scope.
It works!
This eval function works as expected:
GHCi> pprint $ eval M.empty e1
"{a:{aa1:-3, aa2:-3, ab:-3}, b:3}"
GHCi> pprint $ eval M.empty e2
"{a:{aa:-3, ab:3}, b:3}"
Monadic evaluation
The limitation of the eval function above, is that it's pure. In some cases I need to evaluate expressions in a monadic context. For instance I may need IO to offer non-pure functions to the guest language.
I've implemented dozens of versions of eval (both monadic, using RecursiveDo, and of various degrees of purity) in an attempt to understand the issues. I'm presenting the two most interesting ones:
Passing the scope through a State monad
evalSt' :: Expr -> State Scope Expr
evalSt' expr = do
scope <- get
case expr of
Int i -> pure $ Int i
Id str -> case M.lookup str scope of
Just e -> pure e
Nothing -> error $ str ++ " not in scope"
Negate aE -> do
a <- evalSt' aE
case a of
Int i -> pure $ Int $ -i
_ -> error $ "Can only negate ints. Found: " ++ pprint aE
Obj obj -> mdo
put $ M.union newScope scope
newScope <- traverse evalSt' obj
put scope
pure $ Obj newScope
evalSt scope expr = evalState (evalSt' expr) scope
This function is able to evaluate the program e1, but it bottoms (never return) on e2:
GHCi> pprint $ evalSt M.empty e1
"{a:{aa1:-3, aa2:-3, ab:-3}, b:3}"
GHCi> pprint $ evalSt M.empty e2
"{a:{aa:
I still don't understand how it can compute e1, since it does contain Ids: isn't that program strict on the scope and shouldn't it bottom evalSt? Why it doesn't? And what's different in e2 to cause the function the function to terminate?
Evaluating in the IO monad
evalM :: Scope -> Expr -> IO Expr
evalM scope expr = case expr of
Int i -> pure $ Int i
Id str -> case M.lookup str scope of
Just e -> pure e
Nothing -> error $ str ++ " not in scope"
Negate aE -> do
a <- evalM scope aE
case a of
Int i -> pure $ Int $ -i
_ -> error $ "Can only negate ints. Found: " ++ pprint aE
Obj kvMap -> mdo
resMap <- traverse (evalM (M.union resMap scope)) kvMap
pure $ Obj resMap
This function always bottoms (never returns) on every program that uses at least one Id node. Even just {a:1, b:a}.
Scroll back to the top for the questions :-)
How should I tie the knot in a monadic context?
Your pure evaluation function relies on there being no evaluation order in the semantics of Haskell, so that thunks get forced only when needed. In contrast, most effects are fundamentally ordered, so there is an incompatibility there.
Some monads are lazier than the others, and for those you can get some result out of making your evaluation function monadic, as you've seen with evalSt e1. The two most common lazy monads are Reader and lazy State (which is the one you get from Control.Monad.State, as opposed to Control.Monad.State.Strict).
But for other effects, such as IO, you must control evaluation order explicitly, and that means implementing the cache for lazy evaluation explicitly (via STRef for example), instead of implicitly relying on Haskell's own runtime.
In the code I provide later in this post, why does evalSt e1 terminate, while evalSt e2 doesn't? I can't still understand what's the difference.
To see what is going wrong, unfold traverse evalSt' obj where obj is {aa:(-ab), ab:b}.
traverse evalSt' obj
=
do
x <- evalSt' (Negate (Id "ab"))
y <- evalSt' (Id "b")
pure [("aa", x), ("ab", y)]
=
do
-- evalSt' (Negate (Id "ab"))
scope1 <- get -- unused
a <- evalSt' (Id "ab")
x <- case a of
Int i -> pure $ Int $ -i
_ -> error ...
-- evalSt' (Id "b")
scope2 <- get
y <- case M.lookup "b" scope2 of
Just e -> pure e
Nothing -> error ...
pure [("aa", x), ("ab", y)]
We try to print the object e2, that ends up looking at the value of the "aa" field, which is x in the code above.
x comes from case a of ..., which needs a.
a comes from evalSt' (Id "ab"), which needs the field "ab", which is y (from the knot tying surrounding the traverse evalSt' obj we are looking at).
y comes from case M.lookup "b" scope2 of ..., which needs scope2.
scope2 comes from get, which gets the output state from the action preceding it, which is evaluating x.
We are already trying to evaluate x (from step 2). Hence there is an infinite loop.
This can be fixed by always restoring the state at the end of evalSt' (technically, you only need to do this for Id and Negate, but might as well do it always):
evalSt' e = do
scope <- get
v <- case e of ...
put scope
pure v
Or use Reader instead, which gives you the power to update state locally for subcomputations, which is exactly what you need here. You can use local to surround traverse evalSt' obj:
newScope <- local (const (newScope `M.union` scope)) (traverse evalSt' obj)
Is there a sensible way to debug the knot and figure out what causes it bottom?
I don't have a good answer to this. I'm not familiar with debugging tools in Haskell.
You cannot rely on stack traces because subexpressions may force each other in a rather chaotic order. And there is something interfering with print-debugging (Debug.Trace) that I don't understand. (I would add Debug.Trace.trace (pprint expr) $ do at the beginning of evalSt', but then the trace doesn't make sense to me because things that should be printed once are replicated many times.)
My question relates to the simple interpreter written in in this answer. I already asked a similar question before, which relates to the first non monadic interpreter in the answer behind the link. But there is a second monadic one, to which this question relates.
How could one add IO capabilities to the monadic interpreter (You need to scroll down because the answer contains two variants, the first being non-monadic and the second monadic.)? By this I simply mean adding a statement that uses putStrLn. I'm not that well versed in Haskell yet, but I'm guessing you can just combine the IO monad with the interpreter monad somehow. Can somebody point me in the right direction?
data Stmt
= Var := Exp
| While Exp Stmt
| Seq [Stmt]
| Print Exp -- a print statement
A straightforward approach is to change Interp to incorporate IO.
newtype Interp a = Interp { runInterp :: Store -> IO (Either String (a, Store)) }
Then we just need to update the Monad instance, rd, wr, and run for the new internals of Interp by sprinkling in some returns and binds. For example, here’s the new Monad instance:
instance Monad Interp where
return x = Interp $ \r -> return (Right (x, r))
i >>= k =
Interp $ \r -> do
res <- runInterp i r
case res of
Left msg -> return (Left msg)
Right (x, r') -> runInterp (k x) r'
fail msg = Interp $ \_ -> return (Left msg)
One of the advantages of having abstracted out Interp in the first place was so that we can make these kinds of changes without modifying the main part of the interpreter (eval and exec) at all.
I'm trying to crudely replicate Parsec in Lua, and I'm having a bit of trouble with the bind function being recursive generating recursive runParsers.
function Parser:bind(f)
return new(function(s)
local result = self.runParser(s)
if result.cons() == Result.Success then
return f(result.get()).runParser(result.get(2))
else
return result
end
end)
end
I'm using a custom system of making ADTs, hence the cons() and get() functions on the return value. The equivalent Haskell code would be something like this.
m >>= f = Parser $ \s -> case result of
Success a cs -> runParser (f a) cs
_ -> result
where
result = runParser m s
The argument to the Parser constructor (the new function in Lua) is the runParser function. So calling a different runParser from within runParser non-tail-recursively generates very deep call stacks, which causes a stack overflow. Any tips on removing the recursion or translating it to tail-recursion?
Continuation passing made this very easy to solve.
function Parser:bind(f)
return new(function(s, cont)
return self.runParser(s, function(result)
if result.cons() == Result.Success then
return f(result.get()).runParser(result.get(2), cont)
else
return cont(result)
end
end)
end)
end
This way, it's tail calls all the way down! Admittedly, there's potential for f to overflow all on its own, but that would be a case of bad programming on the user's side, as f shouldn't go very deep at all.
I am a new lisp user. I have been trying to figure out how to use the lisp read command for about an hour by googling and looking for examples. I have been unsuccessful, and am finally throwing in the towel.
Can someone give me a very simple example of a lisp function that will accept 2 inputs, and add them?
My best attempt:
(defun func ()
(print "Enter first integer")
(read)
(print "Enter second integer")
(read)
(print (+ A B))
)
I have tried experimenting with (read A) or with a prefix to read (format t “~A” string) with no luck. All of the information on the internet that I have been looking for is extremely complicated, and I cannot make heads nor tails of it. Should it really be this hard? I may just be too familiar with bash/ksh/csh/sh...
You never assign the read input to your variables:
(defun func ()
(print "Enter first integer")
(finish-output)
(let ((a (read)))
(print "Enter second integer")
(finish-output)
(let ((b (read)))
(print (+ a b)))))
Function reify allows me to look up information about a given name. For a function the returned value is VarI:
data Info = ... | VarI Name Type (Maybe Dec) Fixity | ...
Here I can examine the function's type, and I'd also like to examine its declaration. However, in the 3rd argument to VarI I always see Nothing. Is there a way to get the function's declaration?
From the template haskell docs on the VarI Info contructor:
A "value" variable (as opposed to a type variable, see TyVarI).
The Maybe Dec field contains Just the declaration which defined the variable -- including the RHS of the declaration -- or else Nothing, in the case where the RHS is unavailable to the compiler. At present, this value is always Nothing: returning the RHS has not yet been implemented because of lack of interest.
Looking at the ghc source mirror on github, the string VarI only appears twice, and both in the compiler/typecheck/TcSplice.lhs implementing the reifyThing function:
reifyThing :: TcTyThing -> TcM TH.Info
-- The only reason this is monadic is for error reporting,
-- which in turn is mainly for the case when TH can't express
-- some random GHC extension
reifyThing (AGlobal (AnId id))
= do { ty <- reifyType (idType id)
; fix <- reifyFixity (idName id)
; let v = reifyName id
; case idDetails id of
ClassOpId cls -> return (TH.ClassOpI v ty (reifyName cls) fix)
_ -> return (TH.VarI v ty Nothing fix)
}
reifyThing (AGlobal (ATyCon tc)) = reifyTyCon tc
reifyThing (AGlobal (ADataCon dc))
= do { let name = dataConName dc
; ty <- reifyType (idType (dataConWrapId dc))
; fix <- reifyFixity name
; return (TH.DataConI (reifyName name) ty
(reifyName (dataConOrigTyCon dc)) fix)
}
reifyThing (ATcId {tct_id = id})
= do { ty1 <- zonkTcType (idType id) -- Make use of all the info we have, even
-- though it may be incomplete
; ty2 <- reifyType ty1
; fix <- reifyFixity (idName id)
; return (TH.VarI (reifyName id) ty2 Nothing fix) }
reifyThing (ATyVar tv tv1)
= do { ty1 <- zonkTcTyVar tv1
; ty2 <- reifyType ty1
; return (TH.TyVarI (reifyName tv) ty2) }
reifyThing thing = pprPanic "reifyThing" (pprTcTyThingCategory thing)
Like the template haskell docs said, the value used for that field is always Nothing.
Digging deaper, this code was added in 2003, in what looks like a rewrite of the reify system. So it does appear to be little interest in getting it working since it has been more than 10 years that field has always had the value Nothing. So I'm guessing if you want the feature you will have to implement it yourself (or propose a good use case to the ghc development mailing list that would encourage someone else to do it).