I'm trying to crudely replicate Parsec in Lua, and I'm having a bit of trouble with the bind function being recursive generating recursive runParsers.
function Parser:bind(f)
return new(function(s)
local result = self.runParser(s)
if result.cons() == Result.Success then
return f(result.get()).runParser(result.get(2))
else
return result
end
end)
end
I'm using a custom system of making ADTs, hence the cons() and get() functions on the return value. The equivalent Haskell code would be something like this.
m >>= f = Parser $ \s -> case result of
Success a cs -> runParser (f a) cs
_ -> result
where
result = runParser m s
The argument to the Parser constructor (the new function in Lua) is the runParser function. So calling a different runParser from within runParser non-tail-recursively generates very deep call stacks, which causes a stack overflow. Any tips on removing the recursion or translating it to tail-recursion?
Continuation passing made this very easy to solve.
function Parser:bind(f)
return new(function(s, cont)
return self.runParser(s, function(result)
if result.cons() == Result.Success then
return f(result.get()).runParser(result.get(2), cont)
else
return cont(result)
end
end)
end)
end
This way, it's tail calls all the way down! Admittedly, there's potential for f to overflow all on its own, but that would be a case of bad programming on the user's side, as f shouldn't go very deep at all.
Related
I'm really really struggling with understanding callCC. I get the power of Continuations and I've been using the concept in some of my projects to create cool concepts. But never have I needed to use something with greater capabilities than cont :: ((a->r)->r)-> Cont r a.
After using it, it makes a lot of sense why they call the Cont Monad the mother of all monads, YET, I don't get when would I need to use callCC, and that's exactly my question.
callCC gives you "early return" semantics, but in a monadic context.
Say you wanted to doOne, and if that returns True, you immediately stop, otherwise you go on to doTwo and doThree:
doOne :: Cont r Bool
doTwo :: Cont r ()
doThree :: Cont r ()
doThings :: Cont r ()
doThings = do
one <- doOne
if one
then pure ()
else do
doTwo
doThree
See that if branching there? One branch is not that bad, could be dealt with, but imagine there are multiple such points where you just want to bail? This gets very ugly very quickly.
With callCC you can have "early return": you bail at the branching point and don't have to nest the rest of the computation:
doThings = callCC \ret -> do
one <- doOne
when one $ ret ()
doTwo
doThree
Much more pleasant to read!
More importantly, since ret here is not a special syntax (like return in C-like languages), but just a value like any other, you can pass it to other functions too! And those functions can then perform what's called "non-local return" - i.e. they can "stop" the doThings computation, even from multiple nested calls deep. For example, I could factor out checking of the doOne's result into a separate function checkOne like this:
checkOne ret = do
one <- doOne
when one $ ret ()
doThings = callCC \ret -> do
checkOne ret
doTwo
doThree
Looking at the definition of getLine in the Haskell Prelude,
I get how the recursion works, where you keep asking for a character until you hit a newline and you buildup a list which you then return wrapped in an IO.
However my question is how do the return statements work in this case, specifically how does return (c:....:return "") work when you hit the base case. How do you cons a return "" on to a list?
return isn't a control structure like in most languages. It's a constructor for monadic values. Let's take a look at its type:
return :: Monad m => a -> m a
In this case, given a String value, it produces a IO String value.
The fact that return is the last expression evaluated in each branch of the if doesn't mean return ends execution; other expressions could occur after return. Consider this simple example from the list monad:
foo :: Int -> Int -> [Int]
foo x y = return x ++ return y
In the list monad, return simply creates a new single-item list containing its argument. Those two lists are then concatenated into the final result list returned by the function.
$ return 3 :: [Int]
[3]
$ foo 3 4
[3,4]
do-notation is a syntax sugar.
do x <- e
rest
is equivalent to
e >>= \x -> rest
where >>= is a flatMap or bind operation (it attaches a callback to IO container).
flatMap :: IO a -> (a -> IO b) -> IO b meaning is: given container of type IO a attach a callback of type a -> IO b, fired when container succeeds in its operation, and this produces a new container of type IO b
So
getLine =
getChar >>= \c ->
if c == '\n'
then (return [])
else getLine >>= \rest ->
return (c : rest)
What is means? getLine immediately delegates execution to getChar IO-container, with a callback, which analyses the character passed to it. If its a newline, it does "return """, which is a construction of IO-container, returning empty String immediately.
Otherwise, we call ourselves, grab the rest and return current character attached to rest.
P.S.: return is used to turn a pure value into container, since Monad interface doesn't allow us to bind non-container-producing callbacks (there are very good reasons for this).
Is it possible to break out of a monad sequence?
For instance, if I want to break out of a sequence earlier based on some condition calculated in the middle of the sequence. Say, in a 'do' notation I bind a value and based on the value I want to either finish the sequence or stop it. Is there something like a 'pass' function?
Thanks.
Directly using if
You could do this directly as Ingo beautifully encapsulated, or equivalently for example
breakOut :: a -> m (Either MyErrorType MyGoodResultType)
breakOut x = do
y <- dosomethingWith x
z <- doSomethingElseWith x y
if isNoGood z then return (Left (someerror z)) else do
w <- process z
v <- munge x y z
u <- fiddleWith w v
return (Right (greatResultsFrom u z))
This is good for simply doing something different based on what values you have.
Using Exceptions in the IO monad
You could use Control.Exception as Michael Litchard correctly pointed out. It has tons of error-handling, control-flow altering stuff in it, and is worth reading if you want to do something complex with this.
This is great if your error production could happen anywhere and your code is complex. You can handle the errors at the top level, or at any level you like. It's very flexible and doesn't mess with your return types. It only works in the IO monad.
import Control.Exception
Really I should roll my own custom type, but I can't be bothered deriving Typable etc, so I'll hack it with the standard error function and a few strings. I feel quite guilty about that.
handleError :: ErrorCall -> IO Int
handleError (ErrorCall msg) = case msg of
"TooBig" -> putStrLn "Error: argument was too big" >> return 10000
"TooSmall" -> putStrLn "Error: argument was too big" >> return 1
"Negative" -> putStrLn "Error: argument was too big" >> return (-1)
"Weird" -> putStrLn "Error: erm, dunno what happened there, sorry." >> return 0
The error handler needs an explicit type to be used in catch. I've flipped the argument to make the do block come last.
exceptOut :: IO Int
exceptOut = flip catch handleError $ do
x <- readLn
if (x < 5) then error "TooSmall" else return ()
y <- readLn
return (50 + x + y)
Monad transformers etc
These are designed to work with any monad, not just IO. They have the same benefits as IO's exceptions, so are officially great, but you need to learn about monad tranformers. Use them if your monad is not IO, and you have complex requirements like I said for Control.Exception.
First, read Gabriel Conzalez's Breaking from a loop for using EitherT to do two different things depending on some condition arising, or MaybeT for just stopping right there in the event of a problem.
If you don't know anything about Monad Transformers, you can start with Martin Grabmüller's Monad Transformers Step by Step. It covers ErrorT. After that read Breaking from a Loop again!
You might also want to read Real World Haskell chapter 19, Error handling.
Call/CC
Continuation Passing Style's callCC is remarkably powerful, but perhaps too powerful, and certainly doesn't produce terribly easy-to-follow code. See this for a fairly positive take, and this for a very negative one.
So what I think you're looking for is the equivalent of return in imperative languages, eg
def do_something
foo
bar
return baz if quux
...
end
Now in haskell this is doesn't work because a monadic chain is just one big function application. We have syntax that makes it look prettier but it could be written as
bind foo (bind bar (bind baz ...)))
and we can't just "stop" applying stuff in the middle. Luckily if you really need it there is an answer from the Cont monad. callCC. This is short for "call with current continuation" and generalizes the notation of returns. If you know Scheme, than this should be familiar.
import Control.Monad.Cont
foo = callCC $ \escape -> do
foo
bar
when baz $ quux >>= escape
...
A runnable example shamelessly stolen from the documentation of Control.Monad.Cont
whatsYourName name =
(`runCont` id) $ do
response <- callCC $ \exit -> do
validateName name exit
return $ "Welcome, " ++ name ++ "!"
return response
validateName name exit = do
when (null name) (exit "You forgot to tell me your name!")
and of course, there is a Cont transformer, ContT (which is absolutely mind bending) that will let you layer this on IO or whatever.
As a sidenote, callCC is a plain old function and completely nonmagical, implementing it is a great challenge
So I suppose there is no way of doing it the way I imagined it originally, which is equivalent of a break function in an imperative loop.
But I still get the same effect below based in Ingo's answer, which is pretty easy (silly me)
doStuff x = if x > 5
then do
t <- getTingFromOutside
doHeavyHalculations t
else return ()
I don't know though how it would work if I need to test 't' in the example above ...
I mean, if I need to test the bound value and make an if decision from there.
You can never break out of a "monad sequence", by definition. Remember that a "monad sequence" is nothing else than one function applied to other values/functions. Even if a "monad sequence" gives you the illusion that you could programme imperative, this is not true (in Haskell)!
The only thing you can do is to return (). This solution of the practical problem has already been named in here. But remember: it gives you only the illusion of being able to break out of the monad!
Scenario: I have an interpreter that builds up values bottom-up from an AST. Certain nodes come with permissions -- additional boolean expressions. Permission failures should propagate, but if a node above in the AST comes with a permission, a success can recover the computation and stop the propagation of the error.
At first I thought the Error MyError MyValue monad would be enough: one of the members of MyError could be PermError, and I could use catchError to recover from PermError if the second check succeeds. However, MyValue is gone by the time I get to the handler. I guess there could ultimately be a way of having PermError carry a MyValue field so that the handler could restore it, but it would probably be ugly and checking for an exception at each step would defeat the concept of an exceptional occurrence.
I'm trying to think of an alternative abstraction. Basically I have to return a datatype Either AllErrorsExceptPermError (Maybe PermError, MyValue) or more simply (Maybe AllErrors, MyValue) (the other errors are unrecoverable and fit the error monad pretty well) and I'm looking for something that would save me from juggling the tuple around, since there seems to be a common pattern in how the operations are chained. My haskell knowledge only goes so far. How would you use haskell to your advantage in this situation?
While I write this I came up with an idea (SO is a fancy rubber duck): a Monad that that handles internally a type (a, b) (and ultimately returns it when the monadic computation terminates, there has to be some kind of runMyMonad), but lets me work with the type b directly as much as possible. Something like
data T = Pass | Fail | Nothing
instance Monad (T , b) where
return v = (Nothing, v)
(Pass, v) >>= g = let (r', v') = g v in (if r' == Fail then Fail else Pass, v')
(Fail, v) >>= g = let (r', v') = g v in (if r' == Pass then Pass else Fail, v')
(Nothing, _) >>= g = error "This should not have been propagated, all chains should start with Pass or Fail"
errors have been simplified into T, and the instance line probably has a syntax error, but you should get the idea. Does this make sense?
I think you can use State monad for permissions and value calculation and wrap that inside ErrorT monad transformer to handle the errors. Below is such an example which shows the idea , here the calculation is summing up a list, permissions are number of even numbers in the list and error condition is when we see 0 in the list.
import Control.Monad.Error
import Control.Monad.State
data ZeroError = ZeroError String
deriving (Show)
instance Error ZeroError where
fun :: [Int] -> ErrorT ZeroError (State Int) Int
fun [] = return 0
fun (0:xs) = throwError $ ZeroError "Zero found"
fun (x:xs) = do
i <- get
put $ (if even(x) then i+1 else i)
z <- fun xs
return $ x+z
main = f $ runState (runErrorT $ fun [1,2,4,5,10]) 0
where
f (Left e,evens) = putStr $ show e
f (Right r,evens) = putStr $ show (r,evens)
I'm learning Haskell and as an exercise I'm trying to convert write the read_from function following code to Haskell. Taken from Peter Norvig's Scheme interpreter.
Is there a straightforward way do this?
def read(s):
"Read a Scheme expression from a string."
return read_from(tokenize(s))
parse = read
def tokenize(s):
"Convert a string into a list of tokens."
return s.replace('(',' ( ').replace(')',' ) ').split()
def read_from(tokens):
"Read an expression from a sequence of tokens."
if len(tokens) == 0:
raise SyntaxError('unexpected EOF while reading')
token = tokens.pop(0)
if '(' == token:
L = []
while tokens[0] != ')':
L.append(read_from(tokens))
tokens.pop(0) # pop off ')'
return L
elif ')' == token:
raise SyntaxError('unexpected )')
else:
return atom(token)
def atom(token):
"Numbers become numbers; every other token is a symbol."
try: return int(token)
except ValueError:
try: return float(token)
except ValueError:
return Symbol(token)
There is a straightforward way to "transliterate" Python into Haskell. This can be done by clever usage of monad transformers, which sounds scary, but it's really not. You see, due to purity, in Haskell when you want to use effects such as mutable state (e.g. the append and pop operations are performing mutation) or exceptions, you have to make it a little more explicit. Let's start at the top.
parse :: String -> SchemeExpr
parse s = readFrom (tokenize s)
The Python docstring said "Read a Scheme expression from a string", so I just took the liberty of encoding this as the type signature (String -> SchemeExpr). That docstring becomes obsolete because the type conveys the same information. Now... what is a SchemeExpr? According to your code, a scheme expression can be an int, float, symbol, or list of scheme expressions. Let's create a data type that represents these options.
data SchemeExpr
= SInt Int
| SFloat Float
| SSymbol String
| SList [SchemeExpr]
deriving (Eq, Show)
In order to tell Haskell that the Int we are dealing with should be treated as a SchemeExpr, we need to tag it with SInt. Likewise with the other possibilities. Let's move on to tokenize.
tokenize :: String -> [Token]
Again, the docstring turns into a type signature: turn a String into a list of Tokens. Well, what's a Token? If you look at the code, you'll notice that the left and right paren characters are apparently special tokens, which signal particular behaviors. Anything else is... unspecial. While we could create a data type to more clearly distinguish parens from other tokens, let's just use Strings, to stick a little closer to the original Python code.
type Token = String
Now let's try writing tokenize. First, let's write a quick little operator for making function chaining look a bit more like Python. In Haskell, you can define your own operators.
(|>) :: a -> (a -> b) -> b
x |> f = f x
tokenize s = s |> replace "(" " ( "
|> replace ")" " ) "
|> words
words is Haskell's version of split. However, Haskell has no pre-cooked version of replace that I know of. Here's one that should do the trick:
-- add imports to top of file
import Data.List.Split (splitOn)
import Data.List (intercalate)
replace :: String -> String -> String -> String
replace old new s = s |> splitOn old
|> intercalate new
If you read the docs for splitOn and intercalate, this simple algorithm should make perfect sense. Haskellers would typically write this as replace old new = intercalate new . splitOn old, but I used |> here for easier Python audience understanding.
Note that replace takes three arguments, but above I only invoked it with two. In Haskell you can partially apply any function, which is pretty neat. |> works sort of like the unix pipe, if you couldn't tell, except with more type safety.
Still with me? Let's skip over to atom. That nested logic is a bit ugly, so let's try a slightly different approach to clean it up. We'll use the Either type for a much nicer presentation.
atom :: Token -> SchemeExpr
atom s = Left s |> tryReadInto SInt
|> tryReadInto SFloat
|> orElse (SSymbol s)
Haskell doesn't have the automagical coersion functions int and float, so instead we will build tryReadInto. Here's how it works: we're going to thread Either values around. An Either value is either a Left or a Right. Conventionally, Left is used to signal error or failure, while Right signals success or completion. In Haskell, to simulate the Python-esque function call chaining, you just place the "self" argument as the last one.
tryReadInto :: Read a => (a -> b) -> Either String b -> Either String b
tryReadInto f (Right x) = Right x
tryReadInto f (Left s) = case readMay s of
Just x -> Right (f x)
Nothing -> Left s
orElse :: a -> Either err a -> a
orElse a (Left _) = a
orElse _ (Right a) = a
tryReadInto relies on type inference in order to determine which type it is trying to parse the string into. If the parse fails, it simply reproduces the same string in the Left position. If it succeeds, then it performs whatever function is desired and places the result in the Right position. orElse allows us to eliminate the Either by supplying a value in case the former computations failed. Can you see how Either acts as a replacement for exceptions here? Since the ValueExceptions in the Python code are always caught inside the function itself, we know that atom will never raise an exception. Similarly, in the Haskell code, even though we used Either on the inside of the function, the interface that we expose is pure: Token -> SchemeExpr, no outwardly-visible side effects.
OK, let's move on to read_from. First, ask yourself the question: what side effects does this function have? It mutates its argument tokens via pop, and it has internal mutation on the list named L. It also raises the SyntaxError exception. At this point, most Haskellers will be throwing up their hands saying "oh noes! side effects! gross!" But the truth is that Haskellers use side effects all the time as well. We just call them "monads" in order to scare people away and avoid success at all costs. Mutation can be accomplished with the State monad, and exceptions with the Either monad (surprise!). We will want to use both at the same time, so we'll in fact use "monad transformers", which I'll explain in a bit. It's not that scary, once you learn to see past the cruft.
First, a few utilities. These are just some simple plumbing operations. raise will let us "raise exceptions" as in Python, and whileM will let us write a while loop as in Python. For the latter, we simply have to make it explicit in what order the effects should happen: first perform the effect to compute the condition, then if it's True, perform the effects of the body and loop again.
import Control.Monad.Trans.State
import Control.Monad.Trans.Class (lift)
raise = lift . Left
whileM :: Monad m => m Bool -> m () -> m ()
whileM mb m = do
b <- mb
if b
then m >> whileM mb m
else return ()
We again want to expose a pure interface. However, there is a chance that there will be a SyntaxError, so we will indicate in the type signature that the result will be either a SchemeExpr or a SyntaxError. This is reminiscent of how in Java you can annotate which exceptions a method will raise. Note that the type signature of parse has to change as well, since it might raise the SyntaxError.
data SyntaxError = SyntaxError String
deriving (Show)
parse :: String -> Either SyntaxError SchemeExpr
readFrom :: [Token] -> Either SyntaxError SchemeExpr
readFrom = evalStateT readFrom'
We are going to perform a stateful computation on the token list that is passed in. Unlike the Python, however, we are not going to be rude to the caller and mutate the very list passed to us. Instead, we will establish our own state space and initialize it to the token list we are given. We will use do notation, which provides syntactic sugar to make it look like we're programming imperatively. The StateT monad transformer gives us the get, put, and modify state operations.
readFrom' :: StateT [Token] (Either SyntaxError) SchemeExpr
readFrom' = do
tokens <- get
case tokens of
[] -> raise (SyntaxError "unexpected EOF while reading")
(token:tokens') -> do
put tokens' -- here we overwrite the state with the "rest" of the tokens
case token of
"(" -> (SList . reverse) `fmap` execStateT readWithList []
")" -> raise (SyntaxError "unexpected close paren")
_ -> return (atom token)
I've broken out the readWithList portion into a separate chunk of code,
because I want you to see the type signature. This portion of code introduces
a new scope, so we simply layer another StateT on top of the monad stack
that we had before. Now, the get, put, and modify operations refer
to the thing called L in the Python code. If we want to perform these operations
on the tokens, then we can simply preface the operation with lift in order
to strip away one layer of the monad stack.
readWithList :: StateT [SchemeExpr] (StateT [Token] (Either SyntaxError)) ()
readWithList = do
whileM ((\toks -> toks !! 0 /= ")") `fmap` lift get) $ do
innerExpr <- lift readFrom'
modify (innerExpr:)
lift $ modify (drop 1) -- this seems to be missing from the Python
In Haskell, appending to the end of a list is inefficient, so I instead prepended, and then reversed the list afterwards. If you are interested in performance, then there are better list-like data structures you can use.
Here is the complete file: http://hpaste.org/77852
So if you're new to Haskell, then this probably looks terrifying. My advice is to just give it some time. The Monad abstraction is not nearly as scary as people make it out to be. You just have to learn that what most languages have baked in (mutation, exceptions, etc), Haskell instead provides via libraries. In Haskell, you must explicitly specify which effects you want, and controlling those effects is a little less convenient. In exchange, however, Haskell provides more safety so you don't accidentally mix up the wrong effects, and more power, because you are in complete control of how to combine and refactor effects.
In Haskell, you wouldn't use an algorithm that mutates the data it operates on. So no, there is no straightforward way to do that. However, the code can be rewritten using recursion to avoid updating variables. Solution below uses the MissingH package because Haskell annoyingly doesn't have a replace function that works on strings.
import Data.String.Utils (replace)
import Data.Tree
import System.Environment (getArgs)
data Atom = Sym String | NInt Int | NDouble Double | Para deriving (Eq, Show)
type ParserStack = (Tree Atom, Tree Atom)
tokenize = words . replace "(" " ( " . replace ")" " ) "
atom :: String -> Atom
atom tok =
case reads tok :: [(Int, String)] of
[(int, _)] -> NInt int
_ -> case reads tok :: [(Double, String)] of
[(dbl, _)] -> NDouble dbl
_ -> Sym tok
empty = Node $ Sym "dummy"
para = Node Para
parseToken (Node _ stack, Node _ out) "(" =
(empty $ stack ++ [empty out], empty [])
parseToken (Node _ stack, Node _ out) ")" =
(empty $ init stack, empty $ (subForest (last stack)) ++ [para out])
parseToken (stack, Node _ out) tok =
(stack, empty $ out ++ [Node (atom tok) []])
main = do
(file:_) <- getArgs
contents <- readFile file
let tokens = tokenize contents
parseStack = foldl parseToken (empty [], empty []) tokens
schemeTree = head $ subForest $ snd parseStack
putStrLn $ drawTree $ fmap show schemeTree
foldl is the haskeller's basic structured recursion tool and it serves the same purpose as your while loop and recursive call to read_from. I think the code can be improved a lot, but I'm not so used to Haskell. Below is an almost straight transliteration of the above to Python:
from pprint import pprint
from sys import argv
def atom(tok):
try:
return 'int', int(tok)
except ValueError:
try:
return 'float', float(tok)
except ValueError:
return 'sym', tok
def tokenize(s):
return s.replace('(',' ( ').replace(')',' ) ').split()
def handle_tok((stack, out), tok):
if tok == '(':
return stack + [out], []
if tok == ')':
return stack[:-1], stack[-1] + [out]
return stack, out + [atom(tok)]
if __name__ == '__main__':
tokens = tokenize(open(argv[1]).read())
tree = reduce(handle_tok, tokens, ([], []))[1][0]
pprint(tree)