I'm writing an interpreter for a small language.
This language supports mutation, so its evaluator keeps track of a Store for all the variables (where type Store = Map.Map Address Value, type Address = Int, and data Value is a language-specific ADT).
It's also possible for computations to fail (e.g., dividing by zero), so the result has to be an Either String Value.
The type of my interpreter, then, is
eval :: Environment -> Expression -> State Store (Either String Value)
where type Environment = Map.Map Identifier Address keeps track of local bindings.
For example, interpreting a constant literal doesn't need to touch the store, and the result always succeeds, so
eval _ (LiteralExpression v) = return $ Right v
But when we apply a binary operator, we do need to consider the store.
For example, if the user evaluates (+ (x <- (+ x 1)) (x <- (+ x 1))) and x is initially 0, then the final result should be 3, and x should be 2 in the resulting store.
This leads to the case
eval env (BinaryOperator op l r) = do
lval <- eval env l
rval <- eval env r
return $ join $ liftM2 (applyBinop op) lval rval
Note that the do-notation is working within the State Store monad.
Furthermore, the use of return is monomorphic in State Store, while the uses of join and liftM2 are monomorphic in the Either String monad.
That is, here we use
(return . join) :: Either String (Either String Value) -> State Store (Either String Value)
and return . join is not a no-op.
(As is evident, applyBinop :: Identifier -> Value -> Value -> Either String Value.)
This seems confusing at best, and this is a relatively simple case.
The case of function application, for example, is considerably more complicated.
What useful best practices should I know about to keep my code readable—and writable?
EDIT: Here's a more typical example, which better showcases the ugliness.
The NewArrayC variant has parameters length :: Expression and element :: Expression (it creates an array of a given length with all elements initialized to a constant).
A simple example is (newArray 3 "foo"), which yields ["foo", "foo", "foo"], but we could also write (newArray (+ 1 2) (concat "fo" "oo")), because we can have arbitrary expressions in a NewArrayC.
But when we actually call
allocateMany :: Int -> Value -> State Store Address,
which takes the number of elements to allocate and the value for each slot, and returns the starting address, we need to unpack those values.
In the logic below, you can see that I'm duplicating a bunch of logic that should be built-in to the Either monad.
All the cases should just be binds.
eval env (NewArrayC len el) = do
lenVal <- eval env len
elVal <- eval env el
case lenVal of
Right (NumV lenNum) -> case elVal of
Right val -> do
addr <- allocateMany lenNum val
return $ Right $ ArrayV addr lenNum -- result data type
left -> return left
Right _ -> return $ Left "expected number in new-array length"
left -> return left
This is what monad transformers are for. There is a StateT transformer to add state to a stack, and an EitherT transformer to add Either-like failure to a stack; however, I prefer ExceptT (which adds Except-like failure), so I will give my discussion in terms of that. Since you want the stateful bit outermost, you should use ExceptT e (State s) as your monad.
type DSL = ExceptT String (State Store)
Note that the stateful operations can be spelled get and put, and these are polymorphic over all instances of MonadState; so that in particular they will work okay in our DSL monad. Similarly, the canonical way to raise an error is throwError, which is polymorphic over all instances of MonadError String; and in particular will work okay in our DSL monad.
So now we would write
eval :: Environment -> Expression -> DSL Value
eval _ (Literal v) = return v
eval e (Binary op l r) = liftM2 (applyBinop op) (eval e l) (eval e r)
You might also consider giving eval a more polymorphic type; it could return an (MonadError String m, MonadState Store m) => m Value instead of a DSL Value. In fact, for allocateMany, it's important that you give it a polymorphic type:
allocateMany :: MonadState Store m => Int -> Value -> m Address
There's two pieces of interest about this type: first, because it is polymorphic over all MonadState Store m instances, you can be just as sure that it only has stateful side effects as if it had the type Int -> Value -> State Store Address that you suggested. However, also because it is polymorphic, it can be specialized to return a DSL Address, so it can be used in (for example) eval. Your example eval code becomes this:
eval env (NewArrayC len el) = do
lenVal <- eval env len
elVal <- eval env el
case lenVal of
NumV lenNum -> allocateMany lenNum elVal
_ -> throwError "expected number in new-array length"
I think that's quite readable, really; nothing too extraneous there.
Related
Let's say you want to write some stateful function in Haskell.
You have to use a monadic style like this: (using whatever state monad)
f :: x -> k -> (ST s) r
So this means that essentially the function takes some input x and k, might use and/or modify the world to compute a return value r.
Assume x is a stateful structure, that might be modified by f. Assume k is just a simple key type used for example to access something in x. k itself will be assigned a simple number type later, but we don't want to have to decide now of its type.
So essentially I know that x is a mutable thing, and k is immutable.
The problem is just looking at f's signature, we cannot tell that, so if f occurs in the body of some more complex monadic code we can't reason very well about those variables.
Example:
g :: x -> k -> (ST s) r
g a i = do
...
f a i -- I don't know if i :: k depends on state
... --- I don't know if i was changed by f
What I mean is that if I was given a variable i of unknown type k, I don't know whether or not it depends on s and whether its value can be affected by a call of f.
This problem of course does not exist when writing pure functions, since everything is immutable.
Is there a way to conveniently annotate and, more importantly, statically enforce that k will remain unchanged in the ST monad when calling f?
Inside ST, you can definitely tell what is mutable: an Int is always an immutable integer, while a STRef s Int is an (immutable) reference to a mutable Int.
Hence,
f :: STRef s Int -> String -> (ST s) Bool
can (read and) modify the Int pointed to the first argument, but can only read the immutable string passed as second argument.
On top of that, f might create (and mutate) new STRef s references to newly allocated values. It can also modify other valued if f was defined using a reference to such values. E.g. in
bar :: forall s . ST s ()
bar = do
x_ref <- newSTRef "hello"
let f :: STRef s String -> String -> ST s ()
f y_ref str = do
y <- readSTRef y_ref
writeSTRef x_ref y
writeSTRef y_ref (y ++ " change " ++ str)
...
calling f will alter both the string which was originally set to "hello" and the string whose reference is passed to f.
In your own example:
g :: x -> k -> (ST s) r
g a i = do
...
f a i -- I don't know if i :: k depends on state
... --- I don't know if i was changed by f
If i :: k was not a reference, it still has the same value. If it was a reference, the referred value might have been changed by f a i.
I am looking at this compiler code snippet and do not understand what evalState does, being new to State Monad.
compileToAst :: FilePath -> String -> Either Errors (Contract (Check Type, Env, SourcePos))
compileToAst source code = case parse parser source code of
Right ast -> let ast' = evalState ast [globals]
errors = lefts $ map ann $ toList ast'
ann (a, _, pos) = a `extend` sourcePosPretty pos
in if null errors then Right ast' else Left errors
Left err -> Left [(SyntaxError $ parseErrorTextPretty err, sourcePosPretty . NE.head $ errorPos err)]
Assuming stateful computation is in the form of s -> (a, s),
ast is a monad, [globals] is s, and evalState ast [globals] returns type a. Where can I find the stateful computation definition transforming s to new s and yielding result a?
The function evalState has type:
evalState :: State s a -> s -> a
The type of the first argument, namely State s a, is actually isomorphic to the function type s -> (a, s). What this means formally is that there exist two functions that convert between them:
runState :: State s a -> (s -> (a, s))
state :: (s -> (a, s)) -> State s a
and if you apply one of these functions and then the other, you get back what you started with (i.e., they are inverses, and their composition is the identity function).
Less formally, it means that wherever you see State s a you can pretend it's the type s -> (a, s) and vice versa, since you can convert back and forth at will using these utility functions runState and state.
Therefore, all evalState does is take a first argument that's isomorphic to a stateful computation s -> (a, s) and runs it using an initial state given by its second argument. It then throws away the final state s and yields the final result of the computation.
Since it's the first argument to evalState that's the stateful computation, it's actually the ast returned when parse parser source code succeeds that's the stateful transformation s -> (a, s) you're looking for.
That is, the value ast has type:
ast :: State Env (Contract (Check Type, Env, SourcePos))
which is isomorphic to:
ast :: Env -> (Contract (Check Type, Env, SourcePos), Env)
so it's a stateful transformation that operates on a state consisting of an environment (list of symbol tables) and yields a contract. All evalState does is pass this stateful transformation an initial state/environment consisting of a singleton representing the global symbol table and then yields its final contract result (throwing away the final list of symbol tables, since it's no longer important once the contract is generated).
So, the way this compiler is designed, it compiles code into an "abstract syntax tree" that, instead of being a tree-like data structure, is actually a function giving a stateful transformation over an environment state that produces a contract; evalState just "runs" the transformation to generate the contract.
In Haskell the State is monad is passed around to extract and store state. And in the two following examples, both pass the State monad using >>, and a close verification (by function inlining and reduction) confirms that the state is indeed passed to the next step.
Yet this seems not very intuitive. So does this mean when I want to pass the State monad I just need >> (or the >>= and lambda expression \s -> a where s is not free in a)? Can anyone provide an intuitive explanation for this fact without bothering to reduce the function?
-- the first example
tick :: State Int Int
tick = get >>= \n ->
put (n+1) >>
return n
-- the second example
type GameValue = Int
type GameState = (Bool, Int)
playGame' :: String -> State GameState GameValue
playGame' [] = get >>= \(on, score) -> return score
playGame' (x: xs) = get >>= \(on, score) ->
case x of
'a' | on -> put (on, score+1)
'b' | on -> put (on, score-1)
'c' -> put (not on, score)
_ -> put (on, score)
>> playGame xs
Thanks a lot!
It really boils down to understanding that state is isomorphic to s -> (a, s). So any value "wrapped" in a monadic action is a result of applying a transformation to some state s (a stateful computation producing a).
Passing a state between two stateful computations
f :: a -> State s b
g :: b -> State s c
corresponds to composing them with >=>
f >=> g
or using >>=
\a -> f a >>= g
the result here is
a -> State s c
it is a stateful action that transforms some underlying state s in some way, it is allowed access to some a and it produces some c. So the entire transformation is allowed to depend on a and the value c is allowed to depend on some state s. This is exactly what you would want to express a stateful computation. The neat thing (and the sole purpose of expressing this machinery as a monad) is that you do not have to bother with passing the state around. But to understand how it is done, please refer to the definition of >>= on hackage), just ignore for a moment that it is a transformer rather than a final monad).
m >>= k = StateT $ \ s -> do
~(a, s') <- runStateT m s
runStateT (k a) s'
you can disregard the wrapping and unwrapping using StateT and runStateT, here m is in form s -> (a, s), k is of form a -> (s -> (b, s)), and you wish to produce a stateful transformation s -> (b, s). So the result is going to be a function of s, to produce b you can use k but you need a first, how do you produce a? you can take m and apply it to the state s, you get a modified state s' from the first monadic action m, and you pass that state into (k a) (which is of type s -> (b, s)). It is here that the state s has passed through m to become s' and be passed to k to become some final s''.
For you as a user of this mechanism, this remains hidden, and that is the neat thing about monads. If you want a state to evolve along some computation, you build your computation from small steps that you express as State-actions and you let do-notation or bind (>>=) to do the chaining/passing.
The sole difference between >>= and >> is that you either care or don't care about the non-state result.
a >> b
is in fact equivalent to
a >>= \_ -> b
so what ever value gets output by the action a, you throw it away (keeping only the modified state) and continue (pass the state along) with the other action b.
Regarding you examples
tick :: State Int Int
tick = get >>= \n ->
put (n+1) >>
return n
you can rewrite it in do-notation as
tick = do
n <- get
put (n + 1)
return n
while the first way of writing it makes it maybe more explicit what is passed how, the second way nicely shows how you do not have to care about it.
First get the current state and expose it (get :: s -> (s, s) in a simplified setting), the <- says that you do care about the value and you do not want to throw it away, the underlying state is also passed in the background without a change (that is how get works).
Then put :: s -> (s -> ((), s)), which is equivalent after dropping unnecessary parens to put :: s -> s -> ((), s), takes a value to replace the current state with (the first argument), and produces a stateful action whose result is the uninteresting value () which you drop (because you do not use <- or because you use >> instead of >>=). Due to put the underlying state has changed to n + 1 and as such it is passed on.
return does nothing to the underlying state, it only returns its argument.
To summarise, tick starts with some initial value s it updates it to s+1 internally and outputs s on the side.
The other example works exactly the same way, >> is only used there to throw away the () produced by put. But state gets passed around all the time.
This is a type declaration of a bind method:
(>>=) :: (Monad m) => m a -> (a -> m b) -> m b
I read this as follows: apply a function that returns a wrapped value, to a wrapped value.
This method was included to Prelude as part of Monad typeclass. That means there are a lot of cases where it's needed.
OK, but I don't understand why it's a typical solution of a typical case at all.
If you already created a function which returns a wrapped value, why that function doesn't already take a wrapped value?
In other words, what are typical cases where there are many functions which take a normal value, but return a wrapped value? (instead of taking a wrapped value and return a wrapped value)
The 'unwrapping' of values is exactly what you want to keep hidden when dealing with monads, since it is this that causes a lot of boilerplate.
For example, if you have a sequence of operations which return Maybe values that you want to combine, you have to manually propagate Nothing if you receive one:
nested :: a -> Maybe b
nested x = case f x of
Nothing -> Nothing
Just r ->
case g r of
Nothing -> Nothing
Just r' ->
case h r' of
Nothing -> Nothing
r'' -> i r''
This is what bind does for you:
Nothing >>= _ = Nothing
Just a >>= f = f a
so you can just write:
nested x = f x >>= g >>= h >>= i
Some monads don't allow you to manually unpack the values at all - the most common example is IO. The only way to get the value from an IO is to map or >>= and both of these require you to propagate IO in the output.
Everyone focuses on IO monad and inability to "unwrap".
But a Monad is not always a container, so you can't unwrap.
Reader r a == r->a such that (Reader r) is a Monad
to my mind is the simplest best example of a Monad that is not a container.
You can easily write a function that can produce m b given a: a->(r->b). But you can't easily "unwrap" the value from m a, because a is not wrapped in it. Monad is a type-level concept.
Also, notice that if you have m a->m b, you don't have a Monad. What Monad gives you, is a way to build a function m a->m b from a->m b (compare: Functor gives you a way to build a function m a->m b from a->b; ApplicativeFunctor gives you a way to build a function m a->m b from m (a->b))
If you already created a function which returns a wrapped value, why that function doesn't already take a wrapped value?
Because that function would have to unwrap its argument in order to do something with it.
But for many choices of m, you can only unwrap a value if you will eventually rewrap your own result. This idea of "unwrap, do something, then rewrap" is embodied in the (>>=) function which unwraps for you, let's you do something, and forces you to rewrap by the type a -> m b.
To understand why you cannot unwrap without eventually rewrapping, we can look at some examples:
If m a = Maybe a, unwrapping for Just x would be easy: just return x. But how can we unwrap Nothing? We cannot. But if we know that we will eventually rewrap, we can skip the "do something" step and return Nothing for the overall operation.
If m a = [a], unwrapping for [x] would be easy: just return x. But for unwrapping [], we need the same trick as for Maybe a. And what about unwrapping [x, y, z]? If we know that we will eventually rewrap, we can execute the "do something" three times, for x, y and z and concat the results into a single list.
If m a = IO a, no unwrapping is easy because we only know the result sometimes in the future, when we actually run the IO action. But if we know that we will eventually rewrap, we can store the "do something" inside the IO action and perform it later, when we execute the IO action.
I hope these examples make it clear that for many interesting choices of m, we can only implement unwrapping if we know that we are going to rewrap. The type of (>>=) allows precisely this assumption, so it is cleverly chosen to make things work.
While (>>=) can sometimes be useful when used directly, its main purpose is to implement the <- bind syntax in do notation. It has the type m a -> (a -> m b) -> m b mainly because, when used in a do notation block, the right hand side of the <- is of type m a, the left hand side "binds" an a to the given identifier and, when combined with remainder of the do block, is of type a -> m b, the resulting monadic action is of type m b, and this is the only type it possibly could have to make this work.
For example:
echo = do
input <- getLine
putStrLn input
The right hand side of the <- is of type IO String
The left hands side of the <- with the remainder of the do block are of type String -> IO (). Compare with the desugared version using >>=:
echo = getLine >>= (\input -> putStrLn input)
The left hand side of the >>= is of type IO String. The right hand side is of type String -> IO (). Now, by applying an eta reduction to the lambda we can instead get:
echo = getLine >>= putStrLn
which shows why >>= is sometimes used directly rather than as the "engine" that powers do notation along with >>.
I'd also like to provide what I think is an important correction to the concept of "unwrapping" a monadic value, which is that it doesn't happen. The Monad class does not provide a generic function of type Monad m => m a -> a. Some particular instances do but this is not a feature of monads in general. Monads, generally speaking, cannot be "unwrapped".
Remember that m >>= k = join (fmap k m) is a law that must be true for any monad. Any particular implementation of >>= must satisfy this law and so must be equivalent to this general implementation.
What this means is that what really happens is that the monadic "computation" a -> m b is "lifted" to become an m a -> m (m b) using fmap and then applied the m a, giving an m (m b); and then join :: m (m a) -> m a is used to squish the two ms together to yield a m b. So the a never gets "out" of the monad. The monad is never "unwrapped". This is an incorrect way to think about monads and I would strongly recommend that you not get in the habit.
I will focus on your point
If you already created a function which returns a wrapped value, why
that function doesn't already take a wrapped value?
and the IO monad. Suppose you had
getLine :: IO String
putStrLn :: IO String -> IO () -- "already takes a wrapped value"
how one could write a program which reads a line and print it twice? An attempt would be
let line = getLine
in putStrLn line >> putStrLn line
but equational reasoning dictates that this is equivalent to
putStrLn getLine >> putStrLn getLine
which reads two lines instead.
What we lack is a way to "unwrap" the getLine once, and use it twice. The same issue would apply to reading a line, printing "hello", and then printing a line:
let line = getLine in putStrLn "hello" >> putStrLn line
-- equivalent to
putStrLn "hello" >> putStrLn getLine
So, we also lack a way to specify "when to unwrap" the getLine. The bind >>= operator provides a way to do this.
A more advanced theoretical note
If you swap the arguments around the (>>=) bind operator becomes (=<<)
(=<<) :: (a -> m b) -> (m a -> m b)
which turns any function f taking an unwrapped value into a function g taking a wrapped
value. Such g is known as the Kleisli extension of f. The bind operator guarantees
such an extension always exists, and provides a convenient way to use it.
Because we like to be able to apply functions like a -> b to our m as. Lifting such a function to m a -> m b is trivial (liftM, liftA, >>= return ., fmap) but the opposite is not necessarily possible.
You want some typical examples? How about putStrLn :: String -> IO ()? It would make no sense for this function to have the type IO String -> IO () because the origin of the string doesn't matter.
Anyway: You might have the wrong idea because of your "wrapped value" metaphor; I use it myself quite often, but it has its limitations. There isn't necessarily a pure way to get an a out of an m a - for example, if you have a getLine :: IO String, there's not a great deal of interesting things you can do with it - you can put it in a list, chain it in a row and other neat things, but you can't get any useful information out of it because you can't look inside an IO action. What you can do is use >>= which gives you a way to use the result of the action.
Similar things apply to monads where the "wrapping" metaphor applies too; For example the point Maybe monad is to avoid manually wrapping and unwrapping values with and from Just all the time.
My two most common examples:
1) I have a series of functions that generate a list of lists, but I finally need a flat list:
f :: a -> [a]
fAppliedThrice :: [a] -> [a]
fAppliedThrice aList = concat (map f (concat (map f (concat (map f a)))))
fAppliedThrice' :: [a] -> [a]
fAppliedThrice' aList = aList >>= f >>= f >>= f
A practical example of using this was when my functions fetched attributes of a foreign key relationship. I could just chain them together to finally obtain a flat list of attributes. Eg: Product hasMany Review hasMany Tag type relationship, and I finally want a list of all the tag names for a product. (I added some template-haskell and got a very good generic attribute fetcher for my purposes).
2) Say you have a series of filter-like functions to apply to some data. And they return Maybe values.
case (val >>= filter >>= filter2 >>= filter3) of
Nothing -> putStrLn "Bad data"
Just x -> putStrLn "Good data"
I'm trying to grasp the State Monad and with this purpose I wanted to write a monadic code that would generate a sequence of random numbers using a Linear Congruential Generator (probably not good, but my intention is just to learn the State Monad, not build a good RNG library).
The generator is just this (I want to generate a sequence of Bools for simplicity):
type Seed = Int
random :: Seed -> (Bool, Seed)
random seed = let (a, c, m) = (1664525, 1013904223, 2^32) -- some params for the LCG
seed' = (a*seed + c) `mod` m
in (even seed', seed') -- return True/False if seed' is even/odd
Don't worry about the numbers, this is just an update rule for the seed that (according to Numerical Recipes) should generate a pseudo-random sequence of Ints. Now, if I want to generate random numbers sequentially I'd do:
rand3Bools :: Seed -> ([Bool], Seed)
rand3Bools seed0 = let (b1, seed1) = random seed0
(b2, seed2) = random seed1
(b3, seed3) = random seed2
in ([b1,b2,b3], seed3)
Ok, so I could avoid this boilerplate by using a State Monad:
import Control.Monad.State
data Random {seed :: Seed, value :: Bool}
nextVal = do
Random seed val <- get
let seed' = updateSeed seed
val' = even seed'
put (Random seed' val')
return val'
updateSeed seed = let (a,b,m) = (1664525, 1013904223, 2^32) in (a*seed + c) `mod` m
And finally:
getNRandSt n = replicateM n nextVal
getNRand :: Int -> Seed -> [Bool]
getNRand n seed = evalState (getNRandStates n) (Random seed True)
Ok, this works fine and give me a list of n pseudo-random Bools for each given seed. But...
I can read what I've done (mainly based on this example: http://www.haskell.org/pipermail/beginners/2008-September/000275.html ) and replicate it to do other things. But I don't think I can understand what's really happening behind the do-notation and monadic functions (like replicateM).
Can anyone help me with some of this doubts?
1 - I've tried to desugar the nextVal function to understand what it does, but I couldn't. I can guess it extracts the current state, updates it and then pass the state ahead to the next computation, but this is just based on reading this do-sugar as if it was english.
How do I really desugar this function to the original >>= and return functions step-by-step?
2 - I couldn't grasp what exactly the put and get functions do. I can guess that they "pack" and "unpack" the state. But the mechanics behind the do-sugar is still elusive to me.
Well, any other general remarks about this code are very welcome. I sometimes fell with Haskell that I can create a code that works and do what I expect it to do, but I can't "follow the evaluation" as I'm accustomed to do with imperative programs.
The State monad does look kind of confusing at first; let's do as Norman Ramsey suggested, and walk through how to implement from scratch. Warning, this is pretty lengthy!
First, State has two type parameters: the type of the contained state data and the type of the final result of the computation. We'll use stateData and result respectively as type variables for them here. This makes sense if you think about it; the defining characteristic of a State-based computation is that it modifies a state while producing an output.
Less obvious is that the type constructor takes a function from a state to a modified state and result, like so:
newtype State stateData result = State (stateData -> (result, stateData))
So while the monad is called "State", the actual value wrapped by the the monad is that of a State-based computation, not the actual value of the contained state.
Keeping that in mind, we shouldn't be surprised to find that the function runState used to execute a computation in the State monad is actually nothing more than an accessor for the wrapped function itself, and could be defined like this:
runState (State f) = f
So what does it mean when you define a function that returns a State value? Let's ignore for a moment the fact that State is a monad, and just look at the underlying types. First, consider this function (which doesn't actually do anything with the state):
len2State :: String -> State Int Bool
len2State s = return ((length s) == 2)
If you look at the definition of State, we can see that here the stateData type is Int, and the result type is Bool, so the function wrapped by the data constructor must have the type Int -> (Bool, Int). Now, imagine a State-less version of len2State--obviously, it would have type String -> Bool. So how would you go about converting such a function into one returning a value that fits into a State wrapper?
Well, obviously, the converted function will need to take a second parameter, an Int representing the state value. It also needs to return a state value, another Int. Since we're not actually doing anything with the state in this function, let's just do the obvious thing--pass that int right on through. Here's a State-shaped function, defined in terms of the State-less version:
len2 :: String -> Bool
len2 s = ((length s) == 2)
len2State :: String -> (Int -> (Bool, Int))
len2State s i = (len2' s, i)
But that's kind of silly and redundant. Let's generalize the conversion so that we can pass in the result value, and turn anything into a State-like function.
convert :: Bool -> (Int -> (Bool, Int))
convert r d = (r, d)
len2 s = ((length s) == 2)
len2State :: String -> (Int -> (Bool, Int))
len2State s = convert (len2 s)
What if we want a function that changes the state? Obviously we can't build one with convert, since we wrote that to pass the state through. Let's keep it simple, and write a function to overwrite the state with a new value. What kind of type would it need? It'll need an Int for the new state value, and of course will have to return a function stateData -> (result, stateData), because that's what our State wrapper needs. Overwriting the state value doesn't really have a sensible result value outside the State computation, so our result here will just be (), the zero-element tuple that represents "no value" in Haskell.
overwriteState :: Int -> (Int -> ((), Int))
overwriteState newState _ = ((), newState)
That was easy! Now, let's actually do something with that state data. Let's rewrite len2State from above into something more sensible: we'll compare the string length to the current state value.
lenState :: String -> (Int -> (Bool, Int))
lenState s i = ((length s) == i, i)
Can we generalize this into a converter and a State-less function, like we did before? Not quite as easily. Our len function will need to take the state as an argument, but we don't want it to "know about" state. Awkward, indeed. However, we can write a quick helper function that handles everything for us: we'll give it a function that needs to use the state value, and it'll pass the value in and then package everything back up into a State-shaped function leaving len none the wiser.
useState :: (Int -> Bool) -> Int -> (Bool, Int)
useState f d = (f d, d)
len :: String -> Int -> Bool
len s i = (length s) == i
lenState :: String -> (Int -> (Bool, Int))
lenState s = useState (len s)
Now, the tricky part--what if we want to string these functions together? Let's say we want to use lenState on a string, then double the state value if the result is false, then check the string again, and finally return true if either check did. We have all the parts we need for this task, but writing it all out would be a pain. Can we make a function that automatically chains together two functions that each return State-like functions? Sure thing! We just need to make sure it takes as arguments two things: the State function returned by the first function, and a function that takes the prior function's result type as an argument. Let's see how it turns out:
chainStates :: (Int -> (result1, Int)) -> (result1 -> (Int -> (result2, Int))) -> (Int -> (result2, Int))
chainStates prev f d = let (r, d') = prev d
in f r d'
All this is doing is applying the first state function to some state data, then applying the second function to the result and the modified state data. Simple, right?
Now, the interesting part: Between chainStates and convert, we should almost be able to turn any combination of State-less functions into a State-enabled function! The only thing we need now is a replacement for useState that returns the state data as its result, so that chainStates can pass it along to the functions that don't know anything about the trick we're pulling on them. Also, we'll use lambdas to accept the result from the previous functions and give them temporary names. Okay, let's make this happen:
extractState :: Int -> (Int, Int)
extractState d = (d, d)
chained :: String -> (Int -> (Bool, Int))
chained str = chainStates extractState $ \state1 ->
let check1 = (len str state1) in
chainStates (overwriteState (
if check1
then state1
else state1 * 2)) $ \ _ ->
chainStates extractState $ \state2 ->
let check2 = (len str state2) in
convert (check1 || check2)
And try it out:
> chained "abcd" 2
(True, 4)
> chained "abcd" 3
(False, 6)
> chained "abcd" 4
(True, 4)
> chained "abcdef" 5
(False, 10)
Of course, we can't forget that State is actually a monad that wraps the State-like functions and keeps us away from them, so none of our nifty functions that we've built will help us with the real thing. Or will they? In a shocking twist, it turns out that the real State monad provides all the same functions, under different names:
runState (State s) = s
return r = State (convert r)
(>>=) s f = State (\d -> let (r, d') = (runState s) d in
runState (f r) d')
get = State extractState
put d = State (overwriteState d)
Note that >>= is almost identical to chainStates, but there was no good way to define it using chainStates. So, to wrap things up, we can rewrite the final example using the real State:
chained str = get >>= \state1 ->
let check1 = (len str state1) in
put (if check1
then state1 else state1 * 2) >>= \ _ ->
get >>= \state2 ->
let check2 = (len str state2) in
return (check1 || check2)
Or, all candied up with the equivalent do notation:
chained str = do
state1 <- get
let check1 = len str state1
_ <- put (if check1 then state1 else state1 * 2)
state2 <- get
let check2 = (len str state2)
return (check1 || check2)
First of all, your example is overly complicated because it doesn't need to store the val in the state monad; only the seed is the persistent state. Second, I think you will have better luck if instead of using the standard state monad, you re-implement all of the state monad and its operations yourself, with their types. I think you will learn more this way. Here are a couple of declarations to get you started:
data MyState s a = MyState (s -> (s, b))
get :: Mystate s s
put :: s -> Mystate s ()
Then you can write your own connectives:
unit :: a -> Mystate s a
bind :: Mystate s a -> (a -> Mystate s b) -> Mystate s b
Finally
data Seed = Seed Int
nextVal :: Mystate Seed Bool
As for your trouble desugaring, the do notation you are using is pretty sophisticated.
But desugaring is a line-at-a-time mechanical procedure. As near as I can make out, your code should desugar like this (going back to your original types and code, which I disagree with):
nextVal = get >>= \ Random seed val ->
let seed' = updateSeed seed
val' = even seed'
in put (Random seed' val') >>= \ _ -> return val'
In order to make the nesting structure a bit clearer, I've taken major liberties with the indentation.
You've got a couple great responses. What I do when working with the State monad is in my mind replace State s a with s -> (s,a) (after all, that's really what it is).
You then get a type for bind that looks like:
(>>=) :: (s -> (s,a)) ->
(a -> s -> (s,b)) ->
(s -> (s,b))
and you see that bind is just a specialized kind of function composition operator, like (.)
I wrote a blog/tutorial on the state monad here. It's probably not particularly good, but helped me grok things a little better by writing it.