understanding do notation and bindings - haskell

I am very new to haskell and I am trying to understand the methodology used to create Monadic parser in this document https://www.cs.nott.ac.uk/~gmh/pearl.pdf
Instead of following it exactly, I am trying to do it a little bit differently in order to understand it correctly, therefore, I ended up with this code
newtype Parser a = Parser (String -> Maybe (a, String))
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> Nothing
(c:cs) -> Just (c, cs))
getParser (Parser x) = x
instance Monad Parser where
return x = Parser (\cs -> Just (x,cs))
(Parser p) >>= f = Parser (\cs -> let result = p cs in
case result of
Nothing -> Nothing
Just (c,cs') -> getParser (f c) cs')
takeThreeDropSecond :: Parser (Char, Char)
takeThreeDropSecond = do
c1 <- item
item
c2 <- item
return (c1, c2)
This seems to be working, but I am having hard time following what is going on in do notation.
For example; in c1 <- item, what is assigned to c1? Is it the function that is contained in Parser type, or result of that computation, or what else? Moreover, second line in do notation is just item, so does it just run item but doesn't assign the result? And finally, what does return (c1,c2) produce? Is it Parser (String -> Maybe ((c1, c2)), String) or just Just (c1, c2)?

The Parser type wraps up a function that can 1) represent failure using Maybe and 2) returns the remaining text that was not parsed through (a, String) along with 3) some value a that was parsed, which can be anything. The monad instance is the plumbing to tie them together. The return implementation creates a Parser around a function that 1) succeeds with Just something, 2) does not modify its input text, and 3) directly passes the value given to it. The >>= implementation takes a parser and a function, then returns a new parser created by first running the p, then based on whether that result passed or failed running f.
In takeThreeDropSecond, first c1 <- item says "parse the given using item, assign its result to c1, and feed the rest of the input forward". This does not assign the function inside the item parser to c1, it assigns the result of running the function inside item against the current input. Then you reach item, which parses a value using item, doesn't assign it to anything, and feeds the rest of the input forward. Next you reach c2 <- item, which does basically the same thing as the first line, and finally return (c1, c2), which would expand to Parser (\cs -> Just ((c1, c2), cs)). This means that return (c1, c2) has the type Parser (Char, Char). With type annotations it would be
takeThreeDropSecond :: Parser (Char, Char)
takeThreeDropSecond = do
(c1 :: Char) <- (item :: Parser Char)
(item :: Parser Char)
(c2 :: Char) <- (item :: Parser Char)
(return (c1, c2) :: Parser (Char, Char))
Note that the last line of any monadic do block must have the same type as the function it is a member of. Since return (c1, c2) has type Parser (Char, Char), so must takeThreeDropSecond, and vice-versa.

Related

How can I declare the types for this problem?

I'm trying to create a simple programming language with some primitives and user defined functions.
These are the types I created:
data Type = IntT | BoolT
data Value = IntV Int | BoolV Bool | OperatorCall String [Value]
data Expr = LetE String Value | ProcedureCall String [Value]
As you can see, I've divided functions into operators (which return a value) and procedures (which don't return anything and act as expressions instead of values). A function call contains the string id of the function being called and the list of arguments being passed. Also, a program is just a list of expressions (I've omitted user defined functions here for the sake of simplicity)
My problem comes from the fact that I need to write a function that parses a function call from a string:
parseFunctionCall :: String -> ???
...
The return type of that function can be a Value (for operator calls) or an Expr (for procedure calls). This function is rather complicated and I'd prefer to avoid writing it twice, or polluting it with an Either return type. What should I do? How can I change my types so that this can be achieved cleanly? Something like this perhaps, but I don't think this is the way:
type FunctionCall = (String, [Value])
data Value = ... | OperatorCall FunctionCall
data Expr = ... | ProcedureCall FunctionCall
parseAsFunctionCall :: String -> FunctionCall
...
You can have the function call parser return (String, [Value]), and let the caller fix that up into whatever data structure they like best -- in your case, by applying \(s, vs) -> OperatorCall s vs if parsing a value or \(s, vs) -> ProcedureCall s vs if parsing an expression.
parseFunctionCall :: Parser (String, [Value])
parseLiteralInt :: Parser Int
parseLiteralBool :: Parser Bool
parseLet :: Parser (String, Value)
(parseFunctionCall, parseLiteralInt, parseBool, parseLet) = {- ... -}
parseValue :: Parser Value
parseValue =
((\(s, vs) -> OperatorCall s vs) <$> parseFunctionCall)
<|>
(IntV <$> parseLiteralInt)
<|>
(BoolV <$> parseLiteralBool)
parseExpr :: Parser Expr
((\(s, vs) -> ProcedureCall s vs) <$> parseFunctionCall)
<|>
((\(s, v) -> Let s v) <$> parseLet)

Haskell parser combinator - do notation

I was reading a tutorial regarding building a parser combinator library and i came across a method which i don't quite understand.
newtype Parser a = Parser {parse :: String -> [(a,String)]}
chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op a = (p `chainl1` op) <|> return a
chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
p `chainl1` op = do {a <- p; rest a}
where rest a = (do f <- op
b <- p
rest (f a b))
<|> return a
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s
the bind is the implementation of the (>>=) operator. I don't quite get how the chainl1 function works. From what I can see you extract f from op and then you apply it to f a b and you recurse, however I do not get how you extract a function from the parser when it should return a list of tuples?
Start by looking at the definition of Parser:
newtype Parser a = Parser {parse :: String -> [(a,String)]}`
A Parser a is really just a wrapper around a function (that we can run later with parse) that takes a String and returns a list of pairs, where each pair contains an a encountered when processing the string, along with the rest of the string that remains to be processed.
Now look at the part of the code in chainl1 that's confusing you: the part where you extract f from op:
f <- op
You remarked: "I do not get how you extract a function from the parser when it should return a list of tuples."
It's true that when we run a Parser a with a string (using parse), we get a list of type [(a,String)] as a result. But this code does not say parse op s. Rather, we are using bind here (with the do-notation syntactic sugar). The problem is that you're thinking about the definition of the Parser datatype, but you're not thinking much about what bind specifically does.
Let's look at what bind is doing in the Parser monad a bit more carefully.
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s
What does p >>= f do? It returns a Parser that, when given a string s, does the following: First, it runs parser p with the string to be parsed, s. This, as you correctly noted, returns a list of type [(a, String)]: i.e. a list of the values of type a encountered, along with the string that remained after each value was encountered. Then it takes this list of pairs and applies a function to each pair. Specifically, each (a, s') pair in this list is transformed by (1) applying f to the parsed value a (f a returns a new parser), and then (2) running this new parser with the remaining string s'. This is a function from a tuple to a list of tuples: (a, s') -> [(b, s'')]... and since we're mapping this function over every tuple in the original list returned by parse p s, this ends up giving us a list of lists of tuples: [[(b, s'')]]. So we concatenate (or join) this list into a single list [(b, s'')]. All in all then, we have a function from s to [(b, s'')], which we then wrap in a Parser newtype.
The crucial point is that when we say f <- op, or op >>= \f -> ... that assigns the name f to the values parsed by op, but f is not a list of tuples, b/c it is not the result of running parse op s.
In general, you'll see a lot of Haskell code that defines some datatype SomeMonad a, along with a bind method that hides a lot of the dirty details for you, and lets you get access to the a values you care about using do-notation like so: a <- ma. It may be instructive to look at the State a monad to see how bind passes around state behind the scenes for you. Similarly, here, when combining parsers, you care most about the values the parser is supposed to recognize... bind is hiding all the dirty work that involves the strings that remain upon recognizing a value of type a.

How can I work in nested monads cleanly?

I'm writing an interpreter for a small language.
This language supports mutation, so its evaluator keeps track of a Store for all the variables (where type Store = Map.Map Address Value, type Address = Int, and data Value is a language-specific ADT).
It's also possible for computations to fail (e.g., dividing by zero), so the result has to be an Either String Value.
The type of my interpreter, then, is
eval :: Environment -> Expression -> State Store (Either String Value)
where type Environment = Map.Map Identifier Address keeps track of local bindings.
For example, interpreting a constant literal doesn't need to touch the store, and the result always succeeds, so
eval _ (LiteralExpression v) = return $ Right v
But when we apply a binary operator, we do need to consider the store.
For example, if the user evaluates (+ (x <- (+ x 1)) (x <- (+ x 1))) and x is initially 0, then the final result should be 3, and x should be 2 in the resulting store.
This leads to the case
eval env (BinaryOperator op l r) = do
lval <- eval env l
rval <- eval env r
return $ join $ liftM2 (applyBinop op) lval rval
Note that the do-notation is working within the State Store monad.
Furthermore, the use of return is monomorphic in State Store, while the uses of join and liftM2 are monomorphic in the Either String monad.
That is, here we use
(return . join) :: Either String (Either String Value) -> State Store (Either String Value)
and return . join is not a no-op.
(As is evident, applyBinop :: Identifier -> Value -> Value -> Either String Value.)
This seems confusing at best, and this is a relatively simple case.
The case of function application, for example, is considerably more complicated.
What useful best practices should I know about to keep my code readable—and writable?
EDIT: Here's a more typical example, which better showcases the ugliness.
The NewArrayC variant has parameters length :: Expression and element :: Expression (it creates an array of a given length with all elements initialized to a constant).
A simple example is (newArray 3 "foo"), which yields ["foo", "foo", "foo"], but we could also write (newArray (+ 1 2) (concat "fo" "oo")), because we can have arbitrary expressions in a NewArrayC.
But when we actually call
allocateMany :: Int -> Value -> State Store Address,
which takes the number of elements to allocate and the value for each slot, and returns the starting address, we need to unpack those values.
In the logic below, you can see that I'm duplicating a bunch of logic that should be built-in to the Either monad.
All the cases should just be binds.
eval env (NewArrayC len el) = do
lenVal <- eval env len
elVal <- eval env el
case lenVal of
Right (NumV lenNum) -> case elVal of
Right val -> do
addr <- allocateMany lenNum val
return $ Right $ ArrayV addr lenNum -- result data type
left -> return left
Right _ -> return $ Left "expected number in new-array length"
left -> return left
This is what monad transformers are for. There is a StateT transformer to add state to a stack, and an EitherT transformer to add Either-like failure to a stack; however, I prefer ExceptT (which adds Except-like failure), so I will give my discussion in terms of that. Since you want the stateful bit outermost, you should use ExceptT e (State s) as your monad.
type DSL = ExceptT String (State Store)
Note that the stateful operations can be spelled get and put, and these are polymorphic over all instances of MonadState; so that in particular they will work okay in our DSL monad. Similarly, the canonical way to raise an error is throwError, which is polymorphic over all instances of MonadError String; and in particular will work okay in our DSL monad.
So now we would write
eval :: Environment -> Expression -> DSL Value
eval _ (Literal v) = return v
eval e (Binary op l r) = liftM2 (applyBinop op) (eval e l) (eval e r)
You might also consider giving eval a more polymorphic type; it could return an (MonadError String m, MonadState Store m) => m Value instead of a DSL Value. In fact, for allocateMany, it's important that you give it a polymorphic type:
allocateMany :: MonadState Store m => Int -> Value -> m Address
There's two pieces of interest about this type: first, because it is polymorphic over all MonadState Store m instances, you can be just as sure that it only has stateful side effects as if it had the type Int -> Value -> State Store Address that you suggested. However, also because it is polymorphic, it can be specialized to return a DSL Address, so it can be used in (for example) eval. Your example eval code becomes this:
eval env (NewArrayC len el) = do
lenVal <- eval env len
elVal <- eval env el
case lenVal of
NumV lenNum -> allocateMany lenNum elVal
_ -> throwError "expected number in new-array length"
I think that's quite readable, really; nothing too extraneous there.

Convert String to Tuple, Special Formatting in Haskell

For a test app, I'm trying to convert a special type of string to a tuple. The string is always in the following format, with an int (n>=1) followed by a character.
Examples of Input String:
"2s"
"13f"
"1b"
Examples of Desired Output Tuples (Int, Char):
(2, 's')
(13, 'f')
(1, 'b')
Any pointers would be extremely appreciated. Thanks.
You can use readS to parse the int and get the rest of the string:
readTup :: String -> (Int, Char)
readTup s = (n, head rest)
where [(n, rest)] = reads s
a safer version would be:
maybeReadTup :: String -> Maybe (Int, Char)
maybeReadTup s = do
[(n, [c])] <- return $ reads s
return (n, c)
Here's one way to do it:
import Data.Maybe (listToMaybe)
parseTuple :: String -> Maybe (Int, Char)
parseTuple s = do
(int, (char:_)) <- listToMaybe $ reads s
return (int, char)
This uses the Maybe Monad to express the possible parse failure. Note that if the (char:_) pattern fails to match (i.e., if there is only a number with no character after it), this gets translated into a Nothing result (this is due to how do notation works in Haskell. It calls the fail function of the Monad if pattern matches fail. In the case of Maybe a, we have fail _ = Nothing). The function also evaluates to Nothing if reads can't read an Int at the beginning of the input. If this happens, reads gives [] which is then turned into Nothing by listToMaybe.

State Monad, sequences of random numbers and monadic code

I'm trying to grasp the State Monad and with this purpose I wanted to write a monadic code that would generate a sequence of random numbers using a Linear Congruential Generator (probably not good, but my intention is just to learn the State Monad, not build a good RNG library).
The generator is just this (I want to generate a sequence of Bools for simplicity):
type Seed = Int
random :: Seed -> (Bool, Seed)
random seed = let (a, c, m) = (1664525, 1013904223, 2^32) -- some params for the LCG
seed' = (a*seed + c) `mod` m
in (even seed', seed') -- return True/False if seed' is even/odd
Don't worry about the numbers, this is just an update rule for the seed that (according to Numerical Recipes) should generate a pseudo-random sequence of Ints. Now, if I want to generate random numbers sequentially I'd do:
rand3Bools :: Seed -> ([Bool], Seed)
rand3Bools seed0 = let (b1, seed1) = random seed0
(b2, seed2) = random seed1
(b3, seed3) = random seed2
in ([b1,b2,b3], seed3)
Ok, so I could avoid this boilerplate by using a State Monad:
import Control.Monad.State
data Random {seed :: Seed, value :: Bool}
nextVal = do
Random seed val <- get
let seed' = updateSeed seed
val' = even seed'
put (Random seed' val')
return val'
updateSeed seed = let (a,b,m) = (1664525, 1013904223, 2^32) in (a*seed + c) `mod` m
And finally:
getNRandSt n = replicateM n nextVal
getNRand :: Int -> Seed -> [Bool]
getNRand n seed = evalState (getNRandStates n) (Random seed True)
Ok, this works fine and give me a list of n pseudo-random Bools for each given seed. But...
I can read what I've done (mainly based on this example: http://www.haskell.org/pipermail/beginners/2008-September/000275.html ) and replicate it to do other things. But I don't think I can understand what's really happening behind the do-notation and monadic functions (like replicateM).
Can anyone help me with some of this doubts?
1 - I've tried to desugar the nextVal function to understand what it does, but I couldn't. I can guess it extracts the current state, updates it and then pass the state ahead to the next computation, but this is just based on reading this do-sugar as if it was english.
How do I really desugar this function to the original >>= and return functions step-by-step?
2 - I couldn't grasp what exactly the put and get functions do. I can guess that they "pack" and "unpack" the state. But the mechanics behind the do-sugar is still elusive to me.
Well, any other general remarks about this code are very welcome. I sometimes fell with Haskell that I can create a code that works and do what I expect it to do, but I can't "follow the evaluation" as I'm accustomed to do with imperative programs.
The State monad does look kind of confusing at first; let's do as Norman Ramsey suggested, and walk through how to implement from scratch. Warning, this is pretty lengthy!
First, State has two type parameters: the type of the contained state data and the type of the final result of the computation. We'll use stateData and result respectively as type variables for them here. This makes sense if you think about it; the defining characteristic of a State-based computation is that it modifies a state while producing an output.
Less obvious is that the type constructor takes a function from a state to a modified state and result, like so:
newtype State stateData result = State (stateData -> (result, stateData))
So while the monad is called "State", the actual value wrapped by the the monad is that of a State-based computation, not the actual value of the contained state.
Keeping that in mind, we shouldn't be surprised to find that the function runState used to execute a computation in the State monad is actually nothing more than an accessor for the wrapped function itself, and could be defined like this:
runState (State f) = f
So what does it mean when you define a function that returns a State value? Let's ignore for a moment the fact that State is a monad, and just look at the underlying types. First, consider this function (which doesn't actually do anything with the state):
len2State :: String -> State Int Bool
len2State s = return ((length s) == 2)
If you look at the definition of State, we can see that here the stateData type is Int, and the result type is Bool, so the function wrapped by the data constructor must have the type Int -> (Bool, Int). Now, imagine a State-less version of len2State--obviously, it would have type String -> Bool. So how would you go about converting such a function into one returning a value that fits into a State wrapper?
Well, obviously, the converted function will need to take a second parameter, an Int representing the state value. It also needs to return a state value, another Int. Since we're not actually doing anything with the state in this function, let's just do the obvious thing--pass that int right on through. Here's a State-shaped function, defined in terms of the State-less version:
len2 :: String -> Bool
len2 s = ((length s) == 2)
len2State :: String -> (Int -> (Bool, Int))
len2State s i = (len2' s, i)
But that's kind of silly and redundant. Let's generalize the conversion so that we can pass in the result value, and turn anything into a State-like function.
convert :: Bool -> (Int -> (Bool, Int))
convert r d = (r, d)
len2 s = ((length s) == 2)
len2State :: String -> (Int -> (Bool, Int))
len2State s = convert (len2 s)
What if we want a function that changes the state? Obviously we can't build one with convert, since we wrote that to pass the state through. Let's keep it simple, and write a function to overwrite the state with a new value. What kind of type would it need? It'll need an Int for the new state value, and of course will have to return a function stateData -> (result, stateData), because that's what our State wrapper needs. Overwriting the state value doesn't really have a sensible result value outside the State computation, so our result here will just be (), the zero-element tuple that represents "no value" in Haskell.
overwriteState :: Int -> (Int -> ((), Int))
overwriteState newState _ = ((), newState)
That was easy! Now, let's actually do something with that state data. Let's rewrite len2State from above into something more sensible: we'll compare the string length to the current state value.
lenState :: String -> (Int -> (Bool, Int))
lenState s i = ((length s) == i, i)
Can we generalize this into a converter and a State-less function, like we did before? Not quite as easily. Our len function will need to take the state as an argument, but we don't want it to "know about" state. Awkward, indeed. However, we can write a quick helper function that handles everything for us: we'll give it a function that needs to use the state value, and it'll pass the value in and then package everything back up into a State-shaped function leaving len none the wiser.
useState :: (Int -> Bool) -> Int -> (Bool, Int)
useState f d = (f d, d)
len :: String -> Int -> Bool
len s i = (length s) == i
lenState :: String -> (Int -> (Bool, Int))
lenState s = useState (len s)
Now, the tricky part--what if we want to string these functions together? Let's say we want to use lenState on a string, then double the state value if the result is false, then check the string again, and finally return true if either check did. We have all the parts we need for this task, but writing it all out would be a pain. Can we make a function that automatically chains together two functions that each return State-like functions? Sure thing! We just need to make sure it takes as arguments two things: the State function returned by the first function, and a function that takes the prior function's result type as an argument. Let's see how it turns out:
chainStates :: (Int -> (result1, Int)) -> (result1 -> (Int -> (result2, Int))) -> (Int -> (result2, Int))
chainStates prev f d = let (r, d') = prev d
in f r d'
All this is doing is applying the first state function to some state data, then applying the second function to the result and the modified state data. Simple, right?
Now, the interesting part: Between chainStates and convert, we should almost be able to turn any combination of State-less functions into a State-enabled function! The only thing we need now is a replacement for useState that returns the state data as its result, so that chainStates can pass it along to the functions that don't know anything about the trick we're pulling on them. Also, we'll use lambdas to accept the result from the previous functions and give them temporary names. Okay, let's make this happen:
extractState :: Int -> (Int, Int)
extractState d = (d, d)
chained :: String -> (Int -> (Bool, Int))
chained str = chainStates extractState $ \state1 ->
let check1 = (len str state1) in
chainStates (overwriteState (
if check1
then state1
else state1 * 2)) $ \ _ ->
chainStates extractState $ \state2 ->
let check2 = (len str state2) in
convert (check1 || check2)
And try it out:
> chained "abcd" 2
(True, 4)
> chained "abcd" 3
(False, 6)
> chained "abcd" 4
(True, 4)
> chained "abcdef" 5
(False, 10)
Of course, we can't forget that State is actually a monad that wraps the State-like functions and keeps us away from them, so none of our nifty functions that we've built will help us with the real thing. Or will they? In a shocking twist, it turns out that the real State monad provides all the same functions, under different names:
runState (State s) = s
return r = State (convert r)
(>>=) s f = State (\d -> let (r, d') = (runState s) d in
runState (f r) d')
get = State extractState
put d = State (overwriteState d)
Note that >>= is almost identical to chainStates, but there was no good way to define it using chainStates. So, to wrap things up, we can rewrite the final example using the real State:
chained str = get >>= \state1 ->
let check1 = (len str state1) in
put (if check1
then state1 else state1 * 2) >>= \ _ ->
get >>= \state2 ->
let check2 = (len str state2) in
return (check1 || check2)
Or, all candied up with the equivalent do notation:
chained str = do
state1 <- get
let check1 = len str state1
_ <- put (if check1 then state1 else state1 * 2)
state2 <- get
let check2 = (len str state2)
return (check1 || check2)
First of all, your example is overly complicated because it doesn't need to store the val in the state monad; only the seed is the persistent state. Second, I think you will have better luck if instead of using the standard state monad, you re-implement all of the state monad and its operations yourself, with their types. I think you will learn more this way. Here are a couple of declarations to get you started:
data MyState s a = MyState (s -> (s, b))
get :: Mystate s s
put :: s -> Mystate s ()
Then you can write your own connectives:
unit :: a -> Mystate s a
bind :: Mystate s a -> (a -> Mystate s b) -> Mystate s b
Finally
data Seed = Seed Int
nextVal :: Mystate Seed Bool
As for your trouble desugaring, the do notation you are using is pretty sophisticated.
But desugaring is a line-at-a-time mechanical procedure. As near as I can make out, your code should desugar like this (going back to your original types and code, which I disagree with):
nextVal = get >>= \ Random seed val ->
let seed' = updateSeed seed
val' = even seed'
in put (Random seed' val') >>= \ _ -> return val'
In order to make the nesting structure a bit clearer, I've taken major liberties with the indentation.
You've got a couple great responses. What I do when working with the State monad is in my mind replace State s a with s -> (s,a) (after all, that's really what it is).
You then get a type for bind that looks like:
(>>=) :: (s -> (s,a)) ->
(a -> s -> (s,b)) ->
(s -> (s,b))
and you see that bind is just a specialized kind of function composition operator, like (.)
I wrote a blog/tutorial on the state monad here. It's probably not particularly good, but helped me grok things a little better by writing it.

Resources