Counting occurrences in an expression - haskell

I'm pretty new to Haskell and I have an assessment which involves a manipulator and evaluator of boolean expressions.
Thee expression type is:
type Variable = String
data Expr = T | Var Variable | And Expr Expr | Not Expr
I've worked through a lot of the questions but i am stuck on how to approach the following function. I need to count the occurences of all the variables in an expression
addCounter :: Expr -> Expr
addCounter = undefined
prop_addCounter1 = addCounter (And (Var "y") (And (Var "x") (Var "y"))) ==
And (Var "y1") (And (Var "x2") (Var "y1"))
prop_addCounter2 = addCounter (Not (And (Var "y") T)) ==
Not (And (Var "y1") T)
I'm not looking for an answer on exactly how to do this as it is an assessment question but I would like some tips on how I would go about approaching this?
In my head I imagine incrementing a counter so that I can get the y1, x2 part but this isn't really something that is possible in Haskell (or not advised to do anyway!) Would I go about this through recursion and if so how do I know what number to add to the variable?

As you say you cannot keep a shared counter which would be very natural in this case. What you can do instead is to pass the current counter value down the tree as you recursively visit all Expr's, and receive back the incremented counter value from the function being called. It must be a two-way communication. You pass down the current value and receive back the updated Expr and the new counter value.
If you want each unique variable name to have the same counter value you need to keep a mapping of variable names to assigned counter values. You need to pass that one around just like the current counter value.
Hope that helps.

Atomize your stateful updates
So, this is definitely a great time to use a State monad. In particular, the atomic transform you're looking for is a way to take String -> String enumerating strings by a unique id for each string. Let's call it enumerate
import Control.Monad.State
-- | This is the only function which is going to touch our 'Variable's
enumerate :: Variable -> State OurState Variable
To do this, we'll need to track state that maps Strings to counts (Ints)
import qualified Data.Map as M
type OurState = Map String Int
runOurState :: State OurState a -> a
runOurState = flip evalState M.empty
runOurState $ mapM enumerate ["x", "y", "z", "x" ,"x", "x", "y"]
-- ["x1", "y1", "z1", "x2", "x3", "x4", "y2"]
so we can implement enumerate pretty directly as a stateful action.
enumerate :: Variable -> State OurState Variable
enumerate var = do m <- get
let n = 1 + M.findWithDefault 0 var m
put $ M.insert var n m
return $ var ++ show n
Cool!
Folding generically over an expression tree
Now we really ought to write an elaborate folding apparatus which maps Expr -> State OurState Expr by applying enumerate on each Var-type leaf.
enumerateExpr :: Expr -> State OurState Expr
enumerateExpr T = return T
enumerateExpr (Var s) = fmap Var (enumerate s)
enumerateExpr (And e1 e2) = do em1 <- addCounter e1
em2 <- addCounter e2
return (Add em1 em2)
enumerateExpr (Not expr) = fmap Not (addCounter expr)
But this is pretty tedious, so we'll use the Uniplate library to keep dry.
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Data
import Data.Generics.Uniplate.Data
data Expr = T | Var Variable | And Expr Expr | Not Expr
deriving (Show,Eq,Ord,Data)
onVarStringM :: (Variable -> State OurState Variable) -> Expr -> State OurState Expr
onVarStringM action = transformM go
where go :: Expr -> State OurState Expr
go (Var s) = fmap Var (action s)
go x = return x
The transformM operator does just what we want—apply a monadic transformation over all the pieces of a generic tree (our Expr).
So now, we just unpack the Stateful action to make addCounter
addCounter :: Expr -> Expr
addCounter = runOurState . onVarStringM enumerate
Oh, wait!
Just noticed, this doesn't actually have the right behavior—it doesn't enumerate your variables quite right (prop_addCounter1 fails but prop_addCounter2 passes). Unfortunately, I'm not really sure how it ought to be done... but given this separation of concerns laid out here it'd be very easy to just write the appropriate enumerate Stateful action and apply it to the same generic Expr-transforming machinery.

Related

How and when to use State functor and State applicative?

I've seen the Maybe and Either functor (and applicative) used in code and that made sense, but I have a hard time coming up with an example of the State functor and applicative. Maybe they are not very useful and only exist because the State monad requires a functor and an applicative? There are plenty of explanations of their implementations out there but not any examples when they are used in code, so I'm looking for illustrations of how they might be useful on their own.
I can think of a couple of examples off the top of my head.
First, one common use for State is to manage a counter for the purpose of making some set of "identifiers" unique. So, the state itself is an Int, and the main primitive state operation is to retrieve the current value of the counter and increment it:
-- the state
type S = Int
newInt :: State S Int
newInt = state (\s -> (s, s+1))
The functor instance is then a succinct way of using the same counter for different types of identifiers, such as term- and type-level variables in some language:
type Prefix = String
data Var = Var Prefix Int
data TypeVar = TypeVar Prefix Int
where you generate fresh identifiers like so:
newVar :: Prefix -> State S Var
newVar s = Var s <$> newInt
newTypeVar :: Prefix -> State S TypeVar
newTypeVar s = TypeVar s <$> newInt
The applicative instance is helpful for writing expressions constructed from such unique identifiers. For example, I've used this approach pretty frequently when writing type checkers, which will often construct types with fresh variables, like so:
typeCheckAFunction = ...
let freshFunctionType = ArrowType <$> newTypeVar <*> newTypeVar
...
Here, freshFunctionType is a new a -> b style type with fresh type variables a and b that can be passed along to a unification step.
Second, another use of State is to manage a seed for random number generation. For example, if you want a low-quality but ultra-fast LCG generator for something, you can write:
lcg :: Word32 -> Word32
lcg x = (a * x + c)
where a = 1664525
c = 1013904223
-- monad for random numbers
type L = State Word32
randWord32 :: L Word32
randWord32 = state $ \s -> let s' = lcg s in (s', s')
The functor instance can be used to modify the Word32 output using a pure conversion function:
randUniform :: L Double
randUniform = toUnit <$> randWord32
where toUnit w = fromIntegral w / fromIntegral (maxBound `asTypeOf` w)
while the applicative instance can be used to write primitives that depend on multiple Word32 outputs:
randUniform2 :: L (Double, Double)
randUniform2 = (,) <$> randUniform <*> randUniform
and expressions that use your random numbers in a reasonably natural way:
-- area of a random triangle, say
a = areaOf <$> (Triangle <$> randUniform2 <*> randUniform2 <$> randUniform2)

Why are ML/Haskell datatypes useful for defining "languages" like arithmetic expressions?

This is more of a soft question about static type systems in functional languages like those of the ML family. I understand why you need datatypes to describe data structures like lists and trees but defining "expressions" like those of propositional logic within datatypes seems to bring just some convenience and is not necessary. For example
datatype arithmetic_exp = Constant of int
| Neg of arithmetic_exp
| Add of (arithmetic_exp * arithmetic_exp)
| Mult of (arithmetic_exp * arithmetic_exp)
defines a set of values, on which you can write an eval function which would give you the result. You could just as well define 4 functions: const: int -> int, neg: int -> int, add: int * int -> int and mult: int * int -> int and then an expression of the sort add (mult (const 3, neg 2), neg 4) would give you the same thing without any loss of static security. The only complication is that you have to do four things instead of two. While learning SML and Haskell I've been trying to think about which features give you something necessary and which are just a convenience, so this is the reason why I'm asking. I guess this would matter if you want to decouple the process of evaluating a value from the value itself but I'm not sure where that would be useful.
Thanks a lot.
There is a duality between initial / first-order / datatype-based encodings (aka deep embeddings) and final / higher-order / evaluator-based encodings (aka shallow embeddings). You can indeed typically use a typeclass of combinators instead of a datatype (and convert back and forth between the two).
Here is a module showing the two approaches:
{-# LANGUAGE GADTs, Rank2Types #-}
module Expr where
data Expr where
Val :: Int -> Expr
Add :: Expr -> Expr -> Expr
class Expr' a where
val :: Int -> a
add :: a -> a -> a
You can see that the two definitions look eerily similar. Expr' a is basically describing an algebra on Expr which means that you can get an a out of an Expr if you have such an Expr' a. Similarly, because you can write an instance Expr' Expr, you're able to reify a term of type forall a. Expr' a => a into a syntactic value of type Expr:
expr :: Expr' a => Expr -> a
expr e = case e of
Val n -> val n
Add p q -> add (expr p) (expr q)
instance Expr' Expr where
val = Val
add = Add
expr' :: (forall a. Expr' a => a) -> Expr
expr' e = e
In the end, picking a representation over another really depends on what your main focus is: if you want to inspect the structure of the expression (e.g. if you want to optimise / compile it), it's easier if you have access to an AST. If, on the other hand, you're only interested in computing an invariant using a fold (e.g. the depth of the expression or its evaluation), a higher order encoding will do.
The ADT is in a form you can inspect and manipulate in ways other than simply evaluating it. Once you hide all the interesting data in a function call, there is no longer anything you can do with it but evaluate it. Consider this definition, similar to the one in your question, but with a Var term to represent variables and with the Mul and Neg terms removed to focus on addition.
data Expr a = Constant a
| Add (Expr a) (Expr a)
| Var String
deriving Show
The obvious function to write is eval, of course. It requires a way to look up a variable's value, and is straightforward:
-- cheating a little bit by assuming all Vars are defined
eval :: Num a => Expr a -> (String -> a) -> a
eval (Constant x) _env = x
eval (Add x y) env = eval x env + eval y env
eval (Var x) env = env x
But suppose you don't have a variable mapping yet. You have a large expression that you will evaluate many times, for different choices of variable. And some silly recursive function built up an expression like:
Add (Constant 1)
(Add (Constant 1)
(Add (Constant 1)
(Add (Constant 1)
(Add (Constant 1)
(Add (Constant 1)
(Var "x"))))))
It would be wasteful to recompute 1+1+1+1+1+1 every time you evaluate this: wouldn't it be nice if your evaluator could realize that this is just another way of writing Add (Constant 6) (Var "x")?
So, you write an expression optimizer, which runs before any variables are available and tries to simplify expressions. There are many simplification rules you could apply, of course; below I've implemented just two very easy ones to illustrate the point.
simplify :: Num a => Expr a -> Expr a
simplify (Add (Constant x) (Constant y)) = Constant $ x + y
simplify (Add (Constant x) (Add (Constant y) z)) = simplify $ Add (Constant $ x + y) z
simplify x = x
Now how does our silly expression look?
> simplify $ Add (Constant 1) (Add (Constant 1) (Add (Constant 1) (Add (Constant 1) (Add (Constant 1) (Add (Constant 1) (Var "x"))))))
Add (Constant 6) (Var "x")
All the unnecessary stuff has been removed, and you now have a nice clean expression to try for various values of x.
How do you do the same thing with a representation of this expression in functions? You can't, because there is no "intermediate form" between the initial specification of the expression and its final evaluation: you can only look at the expression as a single, opaque function call. Evaluating it at a particular value of x necessarily evaluates each of the subexpressions anew, and there is no way to disentangle them.
Here's an extension of the functional type you propose in your question, again enriched with variables:
type FExpr a = (String -> a) -> a
lit :: a -> FExpr a
lit x _env = x
add :: Num a => FExpr a -> FExpr a -> FExpr a
add x y env = x env + y env
var :: String -> FExpr a
var x env = env x
with the same silly expression to evaluate many times:
sample :: Num a => FExpr a
sample = add (lit 1)
(add (lit 1)
(add (lit 1)
(add (lit 1)
(add (lit 1)
(add (lit 1)
(var "x"))))))
It works as expected:
> sample $ \_var -> 5
11
But it has to do a bunch of addition every time you try it for a different x, even though the addition and the variable are mostly unrelated. And there's no way you can simplify the expression tree. You can't simplify it while defining it: that is, you can't make add smarter, because it can't inspect its arguments at all: its arguments are functions which, as far as add is concerned, could do anything at all. And you can't simplify it after constructing it, either: at that point you just have an opaque function that takes in a variable lookup function and produces a value.
By modeling the important parts of your problem as data types in their own right, you make them values that your program can manipulate intelligently. If you leave them as functions, you get a shorter program that is less powerful, because you lock all the information inside lambdas that only GHC can manipulate.
And once you've written it with ADTs, it's not hard to collapse that representation back into the shorter function-based representation if you want. That is, it might be nice to have a function of type
convert :: Expr a -> FExpr a
But in fact, we've already done this! That's exactly the type that eval has. You just might not have noticed because of the FExpr type alias, which is not used in the definition of eval.
So in a way, the ADT representation is more general and more powerful, acting as a tree that you can fold up in many different ways. One of those ways is evaluating it, in exactly the way that the function-based representation does. But there are others:
Simplify the expression before evaluating it
Produce a list of all the variables that must be defined for this expression to be well formed
Count how deeply nested the deepest part of the tree is, to estimate how many stack frames an evaluator might need
Convert the expression to a String approximating a Haskell expression you could type to get the same result
So if possible you'd like to work with information-rich ADTs for as long as you can, and then eventually fold the tree up into a more compact form once you have something specific to do with it.

How can I work in nested monads cleanly?

I'm writing an interpreter for a small language.
This language supports mutation, so its evaluator keeps track of a Store for all the variables (where type Store = Map.Map Address Value, type Address = Int, and data Value is a language-specific ADT).
It's also possible for computations to fail (e.g., dividing by zero), so the result has to be an Either String Value.
The type of my interpreter, then, is
eval :: Environment -> Expression -> State Store (Either String Value)
where type Environment = Map.Map Identifier Address keeps track of local bindings.
For example, interpreting a constant literal doesn't need to touch the store, and the result always succeeds, so
eval _ (LiteralExpression v) = return $ Right v
But when we apply a binary operator, we do need to consider the store.
For example, if the user evaluates (+ (x <- (+ x 1)) (x <- (+ x 1))) and x is initially 0, then the final result should be 3, and x should be 2 in the resulting store.
This leads to the case
eval env (BinaryOperator op l r) = do
lval <- eval env l
rval <- eval env r
return $ join $ liftM2 (applyBinop op) lval rval
Note that the do-notation is working within the State Store monad.
Furthermore, the use of return is monomorphic in State Store, while the uses of join and liftM2 are monomorphic in the Either String monad.
That is, here we use
(return . join) :: Either String (Either String Value) -> State Store (Either String Value)
and return . join is not a no-op.
(As is evident, applyBinop :: Identifier -> Value -> Value -> Either String Value.)
This seems confusing at best, and this is a relatively simple case.
The case of function application, for example, is considerably more complicated.
What useful best practices should I know about to keep my code readable—and writable?
EDIT: Here's a more typical example, which better showcases the ugliness.
The NewArrayC variant has parameters length :: Expression and element :: Expression (it creates an array of a given length with all elements initialized to a constant).
A simple example is (newArray 3 "foo"), which yields ["foo", "foo", "foo"], but we could also write (newArray (+ 1 2) (concat "fo" "oo")), because we can have arbitrary expressions in a NewArrayC.
But when we actually call
allocateMany :: Int -> Value -> State Store Address,
which takes the number of elements to allocate and the value for each slot, and returns the starting address, we need to unpack those values.
In the logic below, you can see that I'm duplicating a bunch of logic that should be built-in to the Either monad.
All the cases should just be binds.
eval env (NewArrayC len el) = do
lenVal <- eval env len
elVal <- eval env el
case lenVal of
Right (NumV lenNum) -> case elVal of
Right val -> do
addr <- allocateMany lenNum val
return $ Right $ ArrayV addr lenNum -- result data type
left -> return left
Right _ -> return $ Left "expected number in new-array length"
left -> return left
This is what monad transformers are for. There is a StateT transformer to add state to a stack, and an EitherT transformer to add Either-like failure to a stack; however, I prefer ExceptT (which adds Except-like failure), so I will give my discussion in terms of that. Since you want the stateful bit outermost, you should use ExceptT e (State s) as your monad.
type DSL = ExceptT String (State Store)
Note that the stateful operations can be spelled get and put, and these are polymorphic over all instances of MonadState; so that in particular they will work okay in our DSL monad. Similarly, the canonical way to raise an error is throwError, which is polymorphic over all instances of MonadError String; and in particular will work okay in our DSL monad.
So now we would write
eval :: Environment -> Expression -> DSL Value
eval _ (Literal v) = return v
eval e (Binary op l r) = liftM2 (applyBinop op) (eval e l) (eval e r)
You might also consider giving eval a more polymorphic type; it could return an (MonadError String m, MonadState Store m) => m Value instead of a DSL Value. In fact, for allocateMany, it's important that you give it a polymorphic type:
allocateMany :: MonadState Store m => Int -> Value -> m Address
There's two pieces of interest about this type: first, because it is polymorphic over all MonadState Store m instances, you can be just as sure that it only has stateful side effects as if it had the type Int -> Value -> State Store Address that you suggested. However, also because it is polymorphic, it can be specialized to return a DSL Address, so it can be used in (for example) eval. Your example eval code becomes this:
eval env (NewArrayC len el) = do
lenVal <- eval env len
elVal <- eval env el
case lenVal of
NumV lenNum -> allocateMany lenNum elVal
_ -> throwError "expected number in new-array length"
I think that's quite readable, really; nothing too extraneous there.

Calling function in haskell

I'm pretty new in Haskell programming. I want to call some functions and save the result in a variable but I don't know how. I read couple of chapters about haskell function in two different book but still don't understand how to do it.
import Data.Map (Map)
import qualified Data.Map as M hiding (Map)
newtype GenEnv elt = Env (Map Id elt)
newEnv :: GenEnv elt -- initialise
newEnv = Env M.empty
newtype GenEnv elt = Env (Map Id elt)
newEnv :: GenEnv elt -- initialise
newEnv = Env M.empty
getEnv :: GenEnv elt -> Id -> Maybe elt -- G(x) (key function)
getEnv (Env env) var = M.lookup var env
union :: GenEnv elt -> (Id,elt) -> GenEnv elt -- G[x:v]
union (Env env) (key,elt) = Env (M.insert key elt env)
-- foldr is faster than addToFM_list!
unionL :: GenEnv elt -> [(Id,elt)] -> GenEnv elt -- list union
unionL (Env env) pairs = Env $ foldr (\(k,e) g -> M.insert k e g) env pairs
What I'm asking here is NOT for somebody to do my work, asking how to call the functions because I don't understand their signature.
As others have mentioned, "variable" is perhaps not the right term. And in the same vein, "calling" a function is perhaps not the right term either. It is helpful, in my opinion, to think about this in terms of mathematical functions:
f(x) = x^2
given the above function, you don't "call" it with a value so much as give a name to the result of evaluating that function at a particular argument:
y = f(2)
It's the same in Haskell. Somewhere in your code you need to use the result of evaluating a function at a particular value. To do that, you can just use the application of that function to that value wherever you need it, or you can bind it to a name in a let binding or a where clause.
So, to provide a simple example in Haskell, you can do something like this:
f :: Int -> Int
f x = x^2
y :: Int
y = f 2
g :: Int -> Int
g x = let y = f 3
in y + 1
h :: Int -> Int
h x = y + 1
where y = f 3
Here I have declared a function called f which takes a single Int value and returns a new Int value, the square of the argument. Then I have declared an Int value named y to be the result of applying f to 2. The value y is not a variable, but rather a binding. It will always be 4.
Then I have declared two other functions, g and h which are equivalent, showing local bindings of the results of applying f.
In your example, the types are perhaps complicating things a little bit. Env is a constructor used to construct a value of the Genenv type. So, to create a value that is a Genenv type, you apply Env to an appropriate argument. This is what newEnv is doing.
Hopefully that's enough to get you started.
Since I pretty much suck at explaining these things, I'd recomment reading this chapter
http://book.realworldhaskell.org/read/types-and-functions.html
should cover everything you need to know to be able to call those functions.
In general in Haskell we apply a function to some arguments and bind the results to some value.
Haskell doesn't have variables, first of all. Generally we talk of binding an expression to an identifier, creating a thunk.
It has the form of
let val = unionEl g l
in --expression using val
In Haskell, function application is done through juxtaposition, but you can use parentheses first to get the feel for it, just don't use commas to separate parameters like you may be used to in other languages.
For instance, assuming Id is just a type synonym for Int, then the following would be an example GenEnv.
example :: GenEnv Char
example = union newEnv (1, 'a')
which is the same as
example :: GenEnv Char
example = union(newEnv (1, 'a'))
Say you have a list of the alphabet, associated with an integer value, called alphabetKeys, you can use the handy little tool you defined to turn that into a GenEnv pretty easily, just by
example :: GenEnv Char
example = union newEnv alphabetKeys
again,
example :: GenEnv Char
example = union(newEnv alphabetKeys)
would work just as well.

State Monad, sequences of random numbers and monadic code

I'm trying to grasp the State Monad and with this purpose I wanted to write a monadic code that would generate a sequence of random numbers using a Linear Congruential Generator (probably not good, but my intention is just to learn the State Monad, not build a good RNG library).
The generator is just this (I want to generate a sequence of Bools for simplicity):
type Seed = Int
random :: Seed -> (Bool, Seed)
random seed = let (a, c, m) = (1664525, 1013904223, 2^32) -- some params for the LCG
seed' = (a*seed + c) `mod` m
in (even seed', seed') -- return True/False if seed' is even/odd
Don't worry about the numbers, this is just an update rule for the seed that (according to Numerical Recipes) should generate a pseudo-random sequence of Ints. Now, if I want to generate random numbers sequentially I'd do:
rand3Bools :: Seed -> ([Bool], Seed)
rand3Bools seed0 = let (b1, seed1) = random seed0
(b2, seed2) = random seed1
(b3, seed3) = random seed2
in ([b1,b2,b3], seed3)
Ok, so I could avoid this boilerplate by using a State Monad:
import Control.Monad.State
data Random {seed :: Seed, value :: Bool}
nextVal = do
Random seed val <- get
let seed' = updateSeed seed
val' = even seed'
put (Random seed' val')
return val'
updateSeed seed = let (a,b,m) = (1664525, 1013904223, 2^32) in (a*seed + c) `mod` m
And finally:
getNRandSt n = replicateM n nextVal
getNRand :: Int -> Seed -> [Bool]
getNRand n seed = evalState (getNRandStates n) (Random seed True)
Ok, this works fine and give me a list of n pseudo-random Bools for each given seed. But...
I can read what I've done (mainly based on this example: http://www.haskell.org/pipermail/beginners/2008-September/000275.html ) and replicate it to do other things. But I don't think I can understand what's really happening behind the do-notation and monadic functions (like replicateM).
Can anyone help me with some of this doubts?
1 - I've tried to desugar the nextVal function to understand what it does, but I couldn't. I can guess it extracts the current state, updates it and then pass the state ahead to the next computation, but this is just based on reading this do-sugar as if it was english.
How do I really desugar this function to the original >>= and return functions step-by-step?
2 - I couldn't grasp what exactly the put and get functions do. I can guess that they "pack" and "unpack" the state. But the mechanics behind the do-sugar is still elusive to me.
Well, any other general remarks about this code are very welcome. I sometimes fell with Haskell that I can create a code that works and do what I expect it to do, but I can't "follow the evaluation" as I'm accustomed to do with imperative programs.
The State monad does look kind of confusing at first; let's do as Norman Ramsey suggested, and walk through how to implement from scratch. Warning, this is pretty lengthy!
First, State has two type parameters: the type of the contained state data and the type of the final result of the computation. We'll use stateData and result respectively as type variables for them here. This makes sense if you think about it; the defining characteristic of a State-based computation is that it modifies a state while producing an output.
Less obvious is that the type constructor takes a function from a state to a modified state and result, like so:
newtype State stateData result = State (stateData -> (result, stateData))
So while the monad is called "State", the actual value wrapped by the the monad is that of a State-based computation, not the actual value of the contained state.
Keeping that in mind, we shouldn't be surprised to find that the function runState used to execute a computation in the State monad is actually nothing more than an accessor for the wrapped function itself, and could be defined like this:
runState (State f) = f
So what does it mean when you define a function that returns a State value? Let's ignore for a moment the fact that State is a monad, and just look at the underlying types. First, consider this function (which doesn't actually do anything with the state):
len2State :: String -> State Int Bool
len2State s = return ((length s) == 2)
If you look at the definition of State, we can see that here the stateData type is Int, and the result type is Bool, so the function wrapped by the data constructor must have the type Int -> (Bool, Int). Now, imagine a State-less version of len2State--obviously, it would have type String -> Bool. So how would you go about converting such a function into one returning a value that fits into a State wrapper?
Well, obviously, the converted function will need to take a second parameter, an Int representing the state value. It also needs to return a state value, another Int. Since we're not actually doing anything with the state in this function, let's just do the obvious thing--pass that int right on through. Here's a State-shaped function, defined in terms of the State-less version:
len2 :: String -> Bool
len2 s = ((length s) == 2)
len2State :: String -> (Int -> (Bool, Int))
len2State s i = (len2' s, i)
But that's kind of silly and redundant. Let's generalize the conversion so that we can pass in the result value, and turn anything into a State-like function.
convert :: Bool -> (Int -> (Bool, Int))
convert r d = (r, d)
len2 s = ((length s) == 2)
len2State :: String -> (Int -> (Bool, Int))
len2State s = convert (len2 s)
What if we want a function that changes the state? Obviously we can't build one with convert, since we wrote that to pass the state through. Let's keep it simple, and write a function to overwrite the state with a new value. What kind of type would it need? It'll need an Int for the new state value, and of course will have to return a function stateData -> (result, stateData), because that's what our State wrapper needs. Overwriting the state value doesn't really have a sensible result value outside the State computation, so our result here will just be (), the zero-element tuple that represents "no value" in Haskell.
overwriteState :: Int -> (Int -> ((), Int))
overwriteState newState _ = ((), newState)
That was easy! Now, let's actually do something with that state data. Let's rewrite len2State from above into something more sensible: we'll compare the string length to the current state value.
lenState :: String -> (Int -> (Bool, Int))
lenState s i = ((length s) == i, i)
Can we generalize this into a converter and a State-less function, like we did before? Not quite as easily. Our len function will need to take the state as an argument, but we don't want it to "know about" state. Awkward, indeed. However, we can write a quick helper function that handles everything for us: we'll give it a function that needs to use the state value, and it'll pass the value in and then package everything back up into a State-shaped function leaving len none the wiser.
useState :: (Int -> Bool) -> Int -> (Bool, Int)
useState f d = (f d, d)
len :: String -> Int -> Bool
len s i = (length s) == i
lenState :: String -> (Int -> (Bool, Int))
lenState s = useState (len s)
Now, the tricky part--what if we want to string these functions together? Let's say we want to use lenState on a string, then double the state value if the result is false, then check the string again, and finally return true if either check did. We have all the parts we need for this task, but writing it all out would be a pain. Can we make a function that automatically chains together two functions that each return State-like functions? Sure thing! We just need to make sure it takes as arguments two things: the State function returned by the first function, and a function that takes the prior function's result type as an argument. Let's see how it turns out:
chainStates :: (Int -> (result1, Int)) -> (result1 -> (Int -> (result2, Int))) -> (Int -> (result2, Int))
chainStates prev f d = let (r, d') = prev d
in f r d'
All this is doing is applying the first state function to some state data, then applying the second function to the result and the modified state data. Simple, right?
Now, the interesting part: Between chainStates and convert, we should almost be able to turn any combination of State-less functions into a State-enabled function! The only thing we need now is a replacement for useState that returns the state data as its result, so that chainStates can pass it along to the functions that don't know anything about the trick we're pulling on them. Also, we'll use lambdas to accept the result from the previous functions and give them temporary names. Okay, let's make this happen:
extractState :: Int -> (Int, Int)
extractState d = (d, d)
chained :: String -> (Int -> (Bool, Int))
chained str = chainStates extractState $ \state1 ->
let check1 = (len str state1) in
chainStates (overwriteState (
if check1
then state1
else state1 * 2)) $ \ _ ->
chainStates extractState $ \state2 ->
let check2 = (len str state2) in
convert (check1 || check2)
And try it out:
> chained "abcd" 2
(True, 4)
> chained "abcd" 3
(False, 6)
> chained "abcd" 4
(True, 4)
> chained "abcdef" 5
(False, 10)
Of course, we can't forget that State is actually a monad that wraps the State-like functions and keeps us away from them, so none of our nifty functions that we've built will help us with the real thing. Or will they? In a shocking twist, it turns out that the real State monad provides all the same functions, under different names:
runState (State s) = s
return r = State (convert r)
(>>=) s f = State (\d -> let (r, d') = (runState s) d in
runState (f r) d')
get = State extractState
put d = State (overwriteState d)
Note that >>= is almost identical to chainStates, but there was no good way to define it using chainStates. So, to wrap things up, we can rewrite the final example using the real State:
chained str = get >>= \state1 ->
let check1 = (len str state1) in
put (if check1
then state1 else state1 * 2) >>= \ _ ->
get >>= \state2 ->
let check2 = (len str state2) in
return (check1 || check2)
Or, all candied up with the equivalent do notation:
chained str = do
state1 <- get
let check1 = len str state1
_ <- put (if check1 then state1 else state1 * 2)
state2 <- get
let check2 = (len str state2)
return (check1 || check2)
First of all, your example is overly complicated because it doesn't need to store the val in the state monad; only the seed is the persistent state. Second, I think you will have better luck if instead of using the standard state monad, you re-implement all of the state monad and its operations yourself, with their types. I think you will learn more this way. Here are a couple of declarations to get you started:
data MyState s a = MyState (s -> (s, b))
get :: Mystate s s
put :: s -> Mystate s ()
Then you can write your own connectives:
unit :: a -> Mystate s a
bind :: Mystate s a -> (a -> Mystate s b) -> Mystate s b
Finally
data Seed = Seed Int
nextVal :: Mystate Seed Bool
As for your trouble desugaring, the do notation you are using is pretty sophisticated.
But desugaring is a line-at-a-time mechanical procedure. As near as I can make out, your code should desugar like this (going back to your original types and code, which I disagree with):
nextVal = get >>= \ Random seed val ->
let seed' = updateSeed seed
val' = even seed'
in put (Random seed' val') >>= \ _ -> return val'
In order to make the nesting structure a bit clearer, I've taken major liberties with the indentation.
You've got a couple great responses. What I do when working with the State monad is in my mind replace State s a with s -> (s,a) (after all, that's really what it is).
You then get a type for bind that looks like:
(>>=) :: (s -> (s,a)) ->
(a -> s -> (s,b)) ->
(s -> (s,b))
and you see that bind is just a specialized kind of function composition operator, like (.)
I wrote a blog/tutorial on the state monad here. It's probably not particularly good, but helped me grok things a little better by writing it.

Resources