Debugging and understanding "tying the knot" in a monadic context - haskell

I'm trying to implement an interpreter for a programming language with lazy-binding in Haskell.
I'm using the tying-the-knot pattern to implement the evaluation of expressions. However I found it extremely hard to debug and to reason about. I spent at least 40 working on this. I learned a lot about laziness and tying-the-knot, but I haven't reached a solution yet and some behaviors still puzzle me.
Questions
Is there a sensible way to debug the knot and figure out what causes it bottom?
GHC stacktrace (printed when using profiling options) shows which function inside the knot triggers a loop. But that's not helpful: I need to understand what makes the knot strict in the knot's definition, and I couldn't find a way to show this.
It's been really hard to understand why the knot bottoms and I don't think it will be much easier, the next times I have to debug something like this.
How should I tie the knot in a monadic context? I learned that a function like traverse is strict for most types and this causes the knot to bottom.
The only solution I can think of, is to remove the knot. That would increase the problem's complexity (every value would need to be re-computed every time), although this issue can be resolved by caching the value in a STRef: that's exactly what I would do in a strict language. I would prefer to avoid this solution and take advantage of Haskell's laziness, otherwise what's the point of it?
In the code I provide later in this post, why does evalSt e1 terminate, while evalSt e2 doesn't? I can't still understand what's the difference.
Language's AST
I tried to simplify my AST as much as possible, and this is the most minimal definition I could come up with:
data Expr = Int Int | Negate Expr | Id String | Obj (M.Map String Expr)
deriving (Eq, Ord, Show)
pprint :: Expr -> String
pprint e = case e of
Int i -> show i
Negate i -> "(-" ++ pprint i ++ ")"
Id i -> i
Obj obj -> "{" ++ intercalate ", "
[ k ++ ":" ++ pprint v | (k,v) <- M.toList obj ] ++ "}"
Example programs
Here are a couple of example expressions represented with the AST above:
-- expression: {a:{aa1:(-b), aa2:ab, ab:(-b)}, b:3}
-- should evalutae to: {a:{aa1:-3, aa2:-3, ab:-3 }, b:3}
e1 = Obj $ M.fromList [
("a", Obj $ M.fromList [
("aa1", Negate $ Id "b"),
("aa2", Id "ab"),
("ab", Negate $ Id "b")
]),
("b", Int 3)
]
-- expression: {a:{aa:(-ab), ab:b}, b:3}
-- should evaluate to: {a:{aa:-3, ab:3}, b:3}
e2 = Obj $ M.fromList [
("a", Obj $ M.fromList [
("aa", Negate $ Id "ab"),
("ab", Id "b")
]),
("b", Int 3)
]
Pure eval function
I have then defined a function to evaluate an expression. This is the most simple definition I could write:
type Scope = M.Map String Expr
eval :: Scope -> Expr -> Expr
eval scope expr = case expr of
Int i -> Int i
Id str -> case M.lookup str scope of
Just e -> e
Nothing -> error $ str ++ " not in scope"
Negate aE -> do
case (eval scope aE) of
Int i -> Int $ -i
_ -> error $ "Can only negate ints. Found: " ++ pprint aE
Obj kvMap -> Obj $
let resMap = fmap (eval (M.union resMap scope)) kvMap
in resMap
Tying the knot
The most interesting part in the eval function is the tying the knot in the Obj kvMap case:
let resMap = fmap (eval (M.union resMap scope)) kvMap
in resMap
The idea is that in order to compute the expressions in kvMap, the identifiers need to be able to access both the values in scope and the results of the expressions in kvMap. The computed values are resMap, and to compute them we use the scope resMap ⋃ scope.
It works!
This eval function works as expected:
GHCi> pprint $ eval M.empty e1
"{a:{aa1:-3, aa2:-3, ab:-3}, b:3}"
GHCi> pprint $ eval M.empty e2
"{a:{aa:-3, ab:3}, b:3}"
Monadic evaluation
The limitation of the eval function above, is that it's pure. In some cases I need to evaluate expressions in a monadic context. For instance I may need IO to offer non-pure functions to the guest language.
I've implemented dozens of versions of eval (both monadic, using RecursiveDo, and of various degrees of purity) in an attempt to understand the issues. I'm presenting the two most interesting ones:
Passing the scope through a State monad
evalSt' :: Expr -> State Scope Expr
evalSt' expr = do
scope <- get
case expr of
Int i -> pure $ Int i
Id str -> case M.lookup str scope of
Just e -> pure e
Nothing -> error $ str ++ " not in scope"
Negate aE -> do
a <- evalSt' aE
case a of
Int i -> pure $ Int $ -i
_ -> error $ "Can only negate ints. Found: " ++ pprint aE
Obj obj -> mdo
put $ M.union newScope scope
newScope <- traverse evalSt' obj
put scope
pure $ Obj newScope
evalSt scope expr = evalState (evalSt' expr) scope
This function is able to evaluate the program e1, but it bottoms (never return) on e2:
GHCi> pprint $ evalSt M.empty e1
"{a:{aa1:-3, aa2:-3, ab:-3}, b:3}"
GHCi> pprint $ evalSt M.empty e2
"{a:{aa:
I still don't understand how it can compute e1, since it does contain Ids: isn't that program strict on the scope and shouldn't it bottom evalSt? Why it doesn't? And what's different in e2 to cause the function the function to terminate?
Evaluating in the IO monad
evalM :: Scope -> Expr -> IO Expr
evalM scope expr = case expr of
Int i -> pure $ Int i
Id str -> case M.lookup str scope of
Just e -> pure e
Nothing -> error $ str ++ " not in scope"
Negate aE -> do
a <- evalM scope aE
case a of
Int i -> pure $ Int $ -i
_ -> error $ "Can only negate ints. Found: " ++ pprint aE
Obj kvMap -> mdo
resMap <- traverse (evalM (M.union resMap scope)) kvMap
pure $ Obj resMap
This function always bottoms (never returns) on every program that uses at least one Id node. Even just {a:1, b:a}.
Scroll back to the top for the questions :-)

How should I tie the knot in a monadic context?
Your pure evaluation function relies on there being no evaluation order in the semantics of Haskell, so that thunks get forced only when needed. In contrast, most effects are fundamentally ordered, so there is an incompatibility there.
Some monads are lazier than the others, and for those you can get some result out of making your evaluation function monadic, as you've seen with evalSt e1. The two most common lazy monads are Reader and lazy State (which is the one you get from Control.Monad.State, as opposed to Control.Monad.State.Strict).
But for other effects, such as IO, you must control evaluation order explicitly, and that means implementing the cache for lazy evaluation explicitly (via STRef for example), instead of implicitly relying on Haskell's own runtime.
In the code I provide later in this post, why does evalSt e1 terminate, while evalSt e2 doesn't? I can't still understand what's the difference.
To see what is going wrong, unfold traverse evalSt' obj where obj is {aa:(-ab), ab:b}.
traverse evalSt' obj
=
do
x <- evalSt' (Negate (Id "ab"))
y <- evalSt' (Id "b")
pure [("aa", x), ("ab", y)]
=
do
-- evalSt' (Negate (Id "ab"))
scope1 <- get -- unused
a <- evalSt' (Id "ab")
x <- case a of
Int i -> pure $ Int $ -i
_ -> error ...
-- evalSt' (Id "b")
scope2 <- get
y <- case M.lookup "b" scope2 of
Just e -> pure e
Nothing -> error ...
pure [("aa", x), ("ab", y)]
We try to print the object e2, that ends up looking at the value of the "aa" field, which is x in the code above.
x comes from case a of ..., which needs a.
a comes from evalSt' (Id "ab"), which needs the field "ab", which is y (from the knot tying surrounding the traverse evalSt' obj we are looking at).
y comes from case M.lookup "b" scope2 of ..., which needs scope2.
scope2 comes from get, which gets the output state from the action preceding it, which is evaluating x.
We are already trying to evaluate x (from step 2). Hence there is an infinite loop.
This can be fixed by always restoring the state at the end of evalSt' (technically, you only need to do this for Id and Negate, but might as well do it always):
evalSt' e = do
scope <- get
v <- case e of ...
put scope
pure v
Or use Reader instead, which gives you the power to update state locally for subcomputations, which is exactly what you need here. You can use local to surround traverse evalSt' obj:
newScope <- local (const (newScope `M.union` scope)) (traverse evalSt' obj)
Is there a sensible way to debug the knot and figure out what causes it bottom?
I don't have a good answer to this. I'm not familiar with debugging tools in Haskell.
You cannot rely on stack traces because subexpressions may force each other in a rather chaotic order. And there is something interfering with print-debugging (Debug.Trace) that I don't understand. (I would add Debug.Trace.trace (pprint expr) $ do at the beginning of evalSt', but then the trace doesn't make sense to me because things that should be printed once are replicated many times.)

Related

Confusing type missmatch error in nested do blocks

I'm trying to write an interpreter for a simple embedded scripting language.
The core of it is the eval function, which has the following signature.
type EvalState = () --for later
type EvalResult = State EvalState (IO (Either T.Text Value))
eval :: Expr -> EvalResult
The result type is like this because it is statefull, eval should be able todo IO and it can fail.
There are two datatypes: Expressions and Values and eval converts expression to values. For simple expressions, literals of primitive data types, it's implemented like this:
ok :: Value -> EvalResult
ok val = state (return . Right $ val,)
eval :: Expr -> EvalResult
eval (StrLit t) = ok (StrVal t)
eval (SymLit t) = ok (SymbolVal t)
eval (IntLit i) = ok (IntVal i)
eval (FloatLit f) = ok (FloatVal f)
My problem now is implementing eval for the list literal. Currently my code looks like this:
eval (ListLit elems) = do
elems <- mapM eval elems
do
opts <- sequence elems
return $ fmap (Right . ListVal . V.fromList) (sequence opts)
And this produces the following error:
/home/felix/git/vmail/src/Interpreter/EvalAst.hs:37:13: error:
• Couldn't match type ‘IO’
with ‘StateT EvalState Data.Functor.Identity.Identity’
Expected: StateT
EvalState Data.Functor.Identity.Identity [Either T.Text Value]
Actual: IO [Either T.Text Value]
• In a stmt of a 'do' block: opts <- sequence elems
In a stmt of a 'do' block:
do opts <- sequence elems
fmap (Right . ListVal . V.fromList) (sequence opts)
In the expression:
do elems <- mapM eval elems
do opts <- sequence elems
fmap (Right . ListVal . V.fromList) (sequence opts)
|
37 | opts <- sequence elems
| ^^^^^^^^^^^^^^
My problem is that I'm not understanding this error. My thinking goes like this: the first do puts me in the State monad, so I shold be able to "extract" the results from mapM eval elems which I expect to be [ IO (Either ...) ] The next do should put me in the IO monad (because that's the next inner type in the result type), so I should be able to extract IO values, and as far as I understood it, sequence elems should give me a IO [ Either ... ] which I can then extract, so what is my mistake?
Monad do not compose, in general. I think you are being bit by this.
In general, if m and n are monads, we can not write m (n a) for a monadic action that mixes the features of m and n. In general, it could even fail to be a monad.
In your case, you are using something like State A (IO B) hoping to be able to access the state of type A and still do IO, but that's not the case.
Indeed, by definition we have (up to some wrappers):
State a b = a -> (a,b)
| | |-- result
| |-- new state
|-- old state
which in your case State A (IO B) it becomes
State A (IO B) = A -> (A, IO B)
| | |-- result
| |-- new state
|-- old state
Here, we can see that the new state must be generated without doing IO at all! This is a computation which forces the new state and the IO effects to be completely separated. It effectively behaves as if we had two separate functions
A -> A -- state update, no IO here
A -> IO B -- IO depending on the old state
Likely, that's not what you actually want. That is probably something like
A -> IO (A, B)
allowing the new state to be generated after IO is done.
To obtain that type, you can not nest monads, but you need a monad transformer like StateT:
StateT A IO B = A -> IO (A, B)
This is a monad. You will probably need to use lift or liftIO to convert IO actions to this type, but it should behave as desired.

How to avoid the IO monad when solving arithmetic problems in SBV

I am trying to solve arithmetic problems with SBV.
For example
solution :: SymbolicT IO ()
solution = do
[x, y] <- sFloats ["x", "y"]
constrain $ x + y .<= 2
Main> s1 = sat solution
Main> s2 = isSatisfiable solution
Main> s1
Satisfiable. Model:
x = -1.2030502e-17 :: Float
z = -2.2888208e-37 :: Float
Main> :t s1
s1 :: IO SatResult
Main> s2
True
Main> :t s2
s2 :: IO Bool
While I can do useful things, it is easier for me to work with the pure value (SatResult or Bool) and not with the IO monad.
According to the documentation
sat :: Provable a => a -> IO SatResult
constrain :: SolverContext m => SBool -> m ()
sFloats :: [String] -> Symbolic [SFloat]
type Symbolic = SymbolicT IO
Given the type of functions I use, I understand why I always get to the IO monad.
But looking in the generalized versions of the functions for example sFloats.
sFloats :: MonadSymbolic m => [String] -> m [SFloat]
Depending on type of the function, I can work with a different monad than IO. This gives me hope that we will reach a more useful monad, the Identity monad for example.
Unfortunately looking at the examples always solves the problems within the IO monad, so I couldn't find any examples that would work for me.Besides that I don't have much experience working with monads.
Finally My question is:
Is there any way to avoid the IO monad when solving such a problem with SBV?
Thanks in advance
SBV calls out to the SMT solver of your choice (most likely z3, but others are available too), and presents the results back to you. This means that it performs IO under the hood, and thus you cannot be outside the IO monad. You can create custom monads using MonadSymbolic, but that will not get you out of the IO monad: Since the call to the SMT solver does IO you'll always be in IO.
(And I'd strongly caution against uses of unsafePerformIO as suggested in one of the comments. This is really a bad idea; and you can find lots more information on this elsewhere why you shouldn't do so.)
Note that this is no different than any other IO based computation in Haskell: You perform the IO "in-the-wrapper," but once you get your results, you can do whatever you'd like to do with them in a "pure" environment.
Here's a simple example:
import Data.SBV
import Data.SBV.Control
example :: IO ()
example = runSMT $ do
[x, y] <- sFloats ["x", "y"]
constrain $ x + y .<= 2
query $ do cs <- checkSat
case cs of
Unsat -> io $ putStrLn "Unsatisfiable"
Sat -> do xv <- getValue x
yv <- getValue y
let result = use xv yv
io $ putStrLn $ "Result: " ++ show result
_ -> error $ "Solver said: " ++ show cs
-- Use the results from the solver, in a purely functional way
use :: Float -> Float -> Float
use x y = x + y
Now you can say:
*Main> example
Result: -Infinity
The function example has type IO (), because it does involve calling out to the solver and getting the results. However, once you extract those results (via calls to getValue), you can pass them to the function use which has a very simple purely functional type. So, you keep the "wrapper" in the monad, but actual processing, use-of-the values, etc., remain in the pure world.
Alternatively, you can also extract the values and continue from there:
import Data.SBV
import Data.SBV.Control
example :: IO (Maybe (Float, Float))
example = runSMT $ do
[x, y] <- sFloats ["x", "y"]
constrain $ x + y .<= 2
query $ do cs <- checkSat
case cs of
Unsat -> pure Nothing
Sat -> do xv <- getValue x
yv <- getValue y
pure $ Just (xv, yv)
_ -> error $ "Solver said: " ++ show cs
Now you can say:
*Main> Just (a, b) <- example
*Main> a
-Infinity
*Main> b
4.0302105e-21
Long story short: Don't avoid the IO monad. It's there for a very good reason. Get into it, get your results out, and then the rest of your program can remain purely functional, or whatever other monad you might find yourself in.
Note that none of this is really SBV specific. This is the usual Haskell paradigm of how to use functions with side-effects. (For instance, anytime you use readFile to read the contents of a file to process it further.) Do not try to "get rid of the IO." Instead, simply work with it.
Depending on type of the function, I can work with a different monad than IO.
Not meaningfully different, in the sense you'd hope. Every instance of this class is going to be some transformed version of IO. Sorry!
Time to make a plan that involves understanding and working with IO.

Apparent redundant calls in a IO monad?

Here is a snippet of code taken from the Haskell GPipe project (commented by myself, save the line with "Really?"). In the memoize function, I don't understand why its author call the getter a second time to cache a newly computed value. It doesn't seem necessary to me and it can be removed without apparent bad consequences (at least, a medium-sized project of mine still works without it).
{- | A map (SN stands for stable name) to cache the results 'a' of computations 'm a'.
The type 'm' ends up being constrained to 'MonadIO m' in the various functions using it.
-}
newtype SNMap m a = SNMap (HT.BasicHashTable (StableName (m a)) a)
newSNMap :: IO (SNMap m a)
newSNMap = SNMap <$> HT.new
memoize :: MonadIO m
=> m (SNMap m a) -- ^ A "IO call" to retrieve our cache.
-> m a -- ^ The "IO call" to execute and cache the result.
-> m a -- ^ The result being naturally also returned.
memoize getter m = do
s <- liftIO $ makeStableName $! m -- Does forcing the evaluation make sense here (since we try to avoid it...)?
SNMap h <- getter
x <- liftIO $ HT.lookup h s
case x of
Just a -> return a
Nothing -> do
a <- m
SNMap h' <- getter -- Need to redo because of scope. <- Really?
liftIO $ HT.insert h' s a
return a
I get it. The scope term used is not related to the Haskell 'do' scope. It is simply that a computation could recursively update the cache when evaluated (as in the scopedM function in the same module...). It is kind of obvious in retrospect.

How to hide state from functions that call other functions that use that state

I would like to have some higher level functions in my Haskell program call other functions that eventually call functions that use some state or configuration, and not have to pass the state around all these function calls. I understand this is a classic use of the state monad (or possibly the Reader monad?).
(I'm also not sure if it should be StateT (as in my example below) to enable doing IO, or if results should somehow be output separately.)
At this stage I'm pretty confused by all the tutorials, blog posts, and similar questions here, and can't pick out the solution. Or have I misunderstood the hiding thing?
Here's a small example:
import Control.Monad.State
-- Here's a simple configuration type:
data Config = MkConfig {
name :: String
, num :: Int
} deriving Show
-- Here's a couple of configurations.
-- (They're hard coded and pre-defined.)
c1 = MkConfig "low" 7
c2 = MkConfig "high" 10
-- Here's a lower level function that explicitly uses the config.
-- (The String is ignored here for simplicity, but it could be used.)
fun :: Config -> Int -> Int
fun (MkConfig _ i) j = i*j
-- testA and GoA work fine as expected.
-- fun uses the different configs c1,c2 in the right way.
testA = do
a <- get
lift (print (fun a 2))
put c2
a <- get
lift (print (fun a 4))
goA = evalStateT testA c1
-- (c1 could be put at the start of testA instead.)
-- But what I really want is to use fun2 that calls fun,
-- and not explicitly need state.
-- But this function definition does not compile:
fun2 :: Int -> Int
fun2 j = 3 * fun cf j
-- fun needs a config arg cf, but where from?
-- I would like a similar way of using fun2 as in testB and goB here.
testB = do
a <- get
lift (print (fun2 3)) -- but fun2 doesn't take the state in a
put c2
a <- get
lift (print (fun2 42)) -- but fun2 doesn't take the state in a
goB = evalStateT testB c1
I want to hide the configuration away from the higher level functions like fun2 in my program, while still retaining the ability to change configuration and run those functions with the new configuration. This is a 'how to do it question' (unless I've got the wrong idea completely).
You can't quite "hide the configuration away" in the type signature, of course: a plain old function Int -> Int must be referentially transparent, and so it can't also depend on or accept some Config value.
What you probably want to do is something like:
fun2 :: Int -> State Config Int -- An `Int -> Int` that depends on `Config` state.
-- Compare to how `Int -> IO Int` is like an
-- `Int -> Int` function that depends on IO.
fun2 j = do
c1 <- get
return (3 * fun c1 j)
And then wherever you have a c :: Config, you can get the result by something like
let result = evalState (fun2 42) c -- An Int.
See also Combining StateT IO with State:
hoistState :: Monad m => State s a -> StateT s m a
hoistState = StateT . (return .) . runState
Then you can write something like
testB :: StateT Config IO ()
testB = do
-- Fancy:
result <- hoistState (fun2 42)
-- Equivalent:
c <- get
let result' = evalState (fun2 42) c
lift (print (result, result'))

How do I abstract this pattern in haskell?

Scenario: I have an interpreter that builds up values bottom-up from an AST. Certain nodes come with permissions -- additional boolean expressions. Permission failures should propagate, but if a node above in the AST comes with a permission, a success can recover the computation and stop the propagation of the error.
At first I thought the Error MyError MyValue monad would be enough: one of the members of MyError could be PermError, and I could use catchError to recover from PermError if the second check succeeds. However, MyValue is gone by the time I get to the handler. I guess there could ultimately be a way of having PermError carry a MyValue field so that the handler could restore it, but it would probably be ugly and checking for an exception at each step would defeat the concept of an exceptional occurrence.
I'm trying to think of an alternative abstraction. Basically I have to return a datatype Either AllErrorsExceptPermError (Maybe PermError, MyValue) or more simply (Maybe AllErrors, MyValue) (the other errors are unrecoverable and fit the error monad pretty well) and I'm looking for something that would save me from juggling the tuple around, since there seems to be a common pattern in how the operations are chained. My haskell knowledge only goes so far. How would you use haskell to your advantage in this situation?
While I write this I came up with an idea (SO is a fancy rubber duck): a Monad that that handles internally a type (a, b) (and ultimately returns it when the monadic computation terminates, there has to be some kind of runMyMonad), but lets me work with the type b directly as much as possible. Something like
data T = Pass | Fail | Nothing
instance Monad (T , b) where
return v = (Nothing, v)
(Pass, v) >>= g = let (r', v') = g v in (if r' == Fail then Fail else Pass, v')
(Fail, v) >>= g = let (r', v') = g v in (if r' == Pass then Pass else Fail, v')
(Nothing, _) >>= g = error "This should not have been propagated, all chains should start with Pass or Fail"
errors have been simplified into T, and the instance line probably has a syntax error, but you should get the idea. Does this make sense?
I think you can use State monad for permissions and value calculation and wrap that inside ErrorT monad transformer to handle the errors. Below is such an example which shows the idea , here the calculation is summing up a list, permissions are number of even numbers in the list and error condition is when we see 0 in the list.
import Control.Monad.Error
import Control.Monad.State
data ZeroError = ZeroError String
deriving (Show)
instance Error ZeroError where
fun :: [Int] -> ErrorT ZeroError (State Int) Int
fun [] = return 0
fun (0:xs) = throwError $ ZeroError "Zero found"
fun (x:xs) = do
i <- get
put $ (if even(x) then i+1 else i)
z <- fun xs
return $ x+z
main = f $ runState (runErrorT $ fun [1,2,4,5,10]) 0
where
f (Left e,evens) = putStr $ show e
f (Right r,evens) = putStr $ show (r,evens)

Resources