Describing my Parser type as a series of monad transformers - haskell

I have to describe the Parser type as a series of monad transformers.
As far as I understand, monad transformers are used to wrap monads into another monad. But I don't understand what is the task here.

Instead of defining a new type for Parser, you can simply define it as a type alias for a type created by one or more monad transformers. That is, you definition would look something like
type Parser a = SomeMonadT <some set of monads and types>
Your task, then, is to determine which monad transformer(s) to use, and what the arguments to the transformer(s) should be.

Before we begin to define combinators that act on parsers, we must choose a representation for a parser first.
A parser takes a string and produces an output that can be just about anything. A list parser will produce a list as it's output, an integer parser will produce Ints, a JSON parser might return a custom ADT representing a JSON.
Therefore, it makes sense to make Parser a polymorphic type. It also makes sense to return a list of results instead of a single result since grammar can be ambiguous, and there may be several ways to parse the same input string.
An empty list, then, implies the parser failed to parse the provided input.
newtype Parser a = Parser { parse :: String -> [(a, String)] }
You might wonder why we return the tuple (a, String), not just a. Well, a parser might not be able to parse the entire input string. Often, a parser is only intended to parse some prefix of the input, and let another parser do the rest of the parsing. Thus, we return a pair containing the parse result a and the unconsumed string subsequent parsers can use.
We can start by describing some basic parsers that do very little work. A result parser always succeeds in parsing without consuming the input string.
result :: a -> Parser a
result val = Parser $ \inp -> [(val, inp)]
item unconditionally accepts the first character of any input string.
item :: Parser Char
item = Parser parseItem
where
parseItem [] = []
parseItem (x:xs) = [(x, xs)]
Let's try some of these parsers in GHCi:
*Main> parse (result 42) "abc"
[(42, "abc")]
*Main> parse item "abc"
[('a', "bc")]
Say we want a parser that consumes a string if its first character satisfies a predicate. We can generalize this idea by writing a function that takes a (Char -> Bool) predicate and returns a parser that only consumes an input string if its first character returns True when supplied to the predicate.
The simplest solution for this would be:
sat :: (Char -> Bool) -> Parser Char
sat p = Parser parseIfSat
where
parseIfSat (x : xs) = if p x then [(x, xs)] else []
Using the previously defined item parser (this requires a Monad instance for the type Parser, which I leave to you as an exercise):
sat p =
-- Apply `item`, if it fails on an empty string, we simply short circuit and get `[]`.
item >>= \x ->
if p x
then result x
else zero
parseIfSat [] = []
Now we can use the sat combinator to describe several useful parsers. For example, parser for ASCII digits:
-- import Data.Char (isDigit, isLower, isUpper)
digit :: Parser Char
digit = sat isDigit
You get the idea. You start by defining elementary parsers, and use those to build more complex parsers. The type Parser shown here is actually StateT Monad Transformer, it combines State and [] in this case.
The code shown was taken from here.

Related

Foldl-like operator for Parsec

Suppose I have a function of this type:
once :: (a, b) -> Parser (a, b)
Now, I would like to repeatedly apply this parser (somewhat like using >>=) and use its last output to feed it in the next iteration.
Using something like
sequence :: (a, b) -> Parser (a, b)
sequence inp = once inp >>= sequence
with specifying the initial values for the first parser doesn't work, because it would go on until it inevitably fails. Instead, I would like it to stop when it would fail (somewhat like many).
Trying to fix it using try makes the computation too complex (adding try in each iteration).
sequence :: (a, b) -> Parser (a, b)
sequence inp = try (once inp >>= sequence) <|> pure inp
In other words, I am looking for a function somewhat similar to foldl on Parsers, which stops when the next Parser would fail.
If your once parser fails immediately without consuming input, you don't need try. As a concrete example, consider a rather silly once parser that uses a pair of delimiters to parse the next pair of delimiters:
once :: (Char, Char) -> Parser (Char, Char)
once (c1, c2) = (,) <$ char c1 <*> anyChar <*> anyChar <* char c2
You can parse a nested sequence using:
onces :: (Char, Char) -> Parser (Char, Char)
onces inp = (once inp >>= onces) <|> pure inp
which works fine:
> parseTest (onces ('(',')')) "([])[{}]{xy}xabyDONE"
('a','b')
You only need try if your once might fail after parsing input. For example, the following won't parse without try:
> parseTest (onces ('(',')')) "([])[not valid]"
parse error at (line 1, column 8):
unexpected "t"
expecting "]"
because we start parsing the opening delimiter [ before discovering not valid].
(With try, it returns the correct ('[',']').)
All that being said, I have no idea how you came to the conclusion that using try makes the computation "too complex". If you are just guessing from something you've read about try being potentially inefficient, then you've misunderstood. try can cause problems if it's used in a manner than can result in a big cascade of backtracking. That's not a problem here -- at most, you're backtracking a single once, so don't worry about it.

Haskell: Understanding the bind operator (>>=) for Monads

I have the following function:
parse :: String -> Maybe Token
And I am trying to implement the following function:
maketokenlist :: String -> Maybe [Token]
The function returns Nothing if there is a token that was not able to be parsed (i.e. parse returns Nothing if the token is not an integer or an arithmetic operator), otherwise it returns a list of Tokens.
As Maybe is an instance of the Monad type class, I have the following approach:
maketokenlist str = return (words str) >>= parse
I convert the string into a list of individual tokens (e.g. "2 3 +" becomes ["2","3","+"] , and then map the parse function over each string in the list.
Since the Monad instance for lists is defined as:
instance Monad [] where
return x = [x]
xs >>= f = concat (map f xs)
fail _ = []
However, say that I had the list of strings [2, 3, "+", "a"] and after mapping parse over each element using >>= I get [Just 2, Just 3, Just (+), Nothing], as "a" cannot be parsed. Is there a way to make the function maketokenlist return Nothing using just the >>= operator? Any insights are appreciated.
If parse :: String -> Maybe Token, then:
traverse parse :: [String] -> Maybe [Token]
This version of traverse (which I have specialized to act on lists as the Traversable instance and Maybe as the Applicative instance) may indeed be implemented using (>>=):
listMaybeTraverse parse [] = pure []
listMaybeTraverse parse (s:ss) =
parse s >>= \token ->
listMaybeTraverse parse ss >>= \tokens ->
pure (token:tokens)
I have chosen the names parse, s, and token to show the correspondence with your planned usage, but of course it will work for any appropriately-typed functions, not just parse.
The instance of Monad for lists does not make an appearance in this code.

Converting Paulson's parser combinators to Haskell

I am trying to convert the code from Paulson's ML for the working programmer book chapter 9, Writing Interpreters for the λ-Calculus.
I was wondering if anyone can help me translate this to Haskell.
I'm struggling to understand the syntax.
fun list ph = ph -- repeat ("," $-- ph) >> (op::);
fun pack ph = "(" $-- list ph --$")" >> #1
| empty;
In porting this code to Haskell, I see two challenges: One is rewriting the combinators so they use the type Either SyntaxError rather than exceptions for flow control, and the other is preserving the modularity of ML's functors. That is, writing a parser combinator library that is modular with regards to what keywords / symbols / tokenizer it should use.
While the ML code has the two
functor Lexical (Keyword: KEYWORD) : LEXICAL
functor Parsing (Lex: LEXICAL) : PARSE
you could start by having
data Keyword = Keyword
{ alphas :: [String]
, symbols :: [String]
}
data Token
= Key String
| Id String
deriving (Show, Eq)
lex :: Keyword -> String -> [Token]
lex kw s = ...
where
alphaTok :: String -> Token
alphaTok a | a `elem` alphas kw = Key a
| otherwise = Id a
...
The ML code uses the types string and substring while Haskell's String is actually a [Char]. The lexer functions would look a little different because ML's String.getc could simply be the pattern match c : ss1 in Haskell, etc.
Paulson's parsers have type [Token] → (τ, [Token]) but allow for exceptions. The Haskell parsers could have type [Token] → Either SyntaxError (τ, [Token]):
newtype SyntaxError = SyntaxError String
deriving Show
newtype Parser a = Parser { runParser :: [Token] -> Either SyntaxError (a, [Token]) }
err :: String -> Either SyntaxError b
err msg = Left (SyntaxError msg)
The operators id, $, ||, !!, -- and >> need new names, since they collide with a bunch of built-in operators and single-line comments. Ideas for names could be: ident, kw, |||, +++ and >>>. I would skip implementing the !! operator initially.
Here are two combinators implemented a little differently,
ident :: Parser String
ident = Parser f
where
f :: [Token] -> Either SyntaxError (String, [Token])
f (Id x : toks) = Right (x, toks)
f (Key x : _) = err $ "Identifier expected, got keyword '" ++ x ++ "'"
f [] = err "Identifier expected, got EOF"
(+++) :: Parser a -> Parser b -> Parser (a, b)
(+++) pa pb = Parser $ \toks1 -> do (x, toks2) <- runP pa toks1
(y, toks3) <- runP pb toks2
return ((x, y), toks3)
...
Some final remarks:
Read the paper Monadic Parsing in Haskell (Hutton, Meijer).
You may be interested in SimpleParse by Ken Friis Larsen, an educational parser combinator library that is a simplification of ReadP by Koen Claessen, since its source code is very easy to read. They are both non-deterministic.
If you're interested in using parser combinators in Haskell, rather than porting some old-fashioned library for the learning experience, I encourage you too look at Megaparsec (tutorial), a modern fork of Parsec. The implementation is a little complex.
None of these three libraries (SimpleParse, ReadP, Megaparsec) split lexing and parsing into two separate steps. Rather, they simply build small tokenizing parsers that implicitly eat meaningless whitespace. (See the token combinator in SimpleParse, for example.) However, Megaparsec does allow an arbitrary token type, whether that is Char or some token you have lexed.

How to return a polymorphic type in Haskell based on the results of string parsing?

TL;DR:
How can I write a function which is polymorphic in its return type? I'm working on an exercise where the task is to write a function which is capable of analyzing a String and, depending on its contents, generate either a Vector [Int], Vector [Char] or Vector [String].
Longer version:
Here are a few examples of how the intended function would behave:
The string "1 2\n3 4" would generate a Vector [Int] that's made up of two lists: [1,2] and [3,4].
The string "'t' 'i' 'c'\n't' 'a' 'c'\n't' 'o' 'e'" would generate a Vector [Char] (i.e., made up of the lists "tic", "tac" and "toe").
The string "\"hello\" \"world\"\n\"monad\" \"party\"" would generate a Vector [String] (i.e., ["hello","world"] and ["monad","party"]).
Error-checking/exception handling is not a concern for this particular exercise. At this stage, all testing is done purely, i.e., this isn't in the realm of the IO monad.
What I have so far:
I have a function (and new datatype) which is capable of classifying a string. I also have functions (one for each Int, Char and String) which can convert the string into the necessary Vector.
My question: how can I combine these three conversion functions into a single function?
What I've tried:
(It obviously doesn't typecheck if I stuff the three conversion
functions into a single function (i.e., using a case..of structure
to pattern match on VectorType of the string.
I tried making a Vectorable class and defining a separate instance for each type; I quickly realized that this approach only works if the functions' arguments vary by type. In our case, the the type of the argument doesn't vary (i.e., it's always a String).
My code:
A few comments
Parsing: the mySplitter object and the mySplit function handle the parsing. It's admittedly a crude parser based on the Splitter type and the split function from Data.List.Split.Internals.
Classifying: The classify function is capable of determining the final VectorType based on the string.
Converting: The toVectorNumber, toVectorChar and toVectorString functions are able to convert a string to type Vector [Int], Vector [Char] and Vector [String], respectively.
As a side note, I'm trying out CorePrelude based on a recommendation from a mentor. That's why you'll see me use the generalized versions of the normal Prelude functions.
Code:
import qualified Prelude
import CorePrelude
import Data.Foldable (concat, elem, any)
import Control.Monad (mfilter)
import Text.Read (read)
import Data.Char (isAlpha, isSpace)
import Data.List.Split (split)
import Data.List.Split.Internals (Splitter(..), DelimPolicy(..), CondensePolicy(..), EndPolicy(..), Delimiter(..))
import Data.Vector ()
import qualified Data.Vector as V
data VectorType = Number | Character | TextString deriving (Show)
mySplitter :: [Char] -> Splitter Char
mySplitter elts = Splitter { delimiter = Delimiter [(`elem` elts)]
, delimPolicy = Drop
, condensePolicy = Condense
, initBlankPolicy = DropBlank
, finalBlankPolicy = DropBlank }
mySplit :: [Char]-> [Char]-> [[Char]]
mySplit delims = split (mySplitter delims)
classify :: String -> VectorType
classify xs
| '\"' `elem` cs = TextString
| hasAlpha cs = Character
| otherwise = Number
where
cs = concat $ split (mySplitter "\n") xs
hasAlpha = any isAlpha . mfilter (/=' ')
toRows :: [Char] -> [[Char]]
toRows = mySplit "\n"
toVectorChar :: [Char] -> Vector [Char]
toVectorChar = let toChar = concat . mySplit " \'"
in V.fromList . fmap (toChar) . toRows
toVectorNumber :: [Char] -> Vector [Int]
toVectorNumber = let toNumber = fmap (\x -> read x :: Int) . mySplit " "
in V.fromList . fmap toNumber . toRows
toVectorString :: [Char] -> Vector [[Char]]
toVectorString = let toString = mfilter (/= " ") . mySplit "\""
in V.fromList . fmap toString . toRows
You can't.
Covariant polymorphism is not supported in Haskell, and wouldn't be useful if it were.
That's basically all there is to answer. Now as to why this is so.
It's no good "returning a polymorphic value" like OO languages so like to do, because the only reason to return any value at all is to use it in other functions. Now, in OO languages you don't have functions but methods that come with the object, so it's quite easy to "return different types": each will have its suitable methods built-in, and they can per instance vary. (Whether that's a good idea is another question.)
But in Haskell, the functions come from elsewhere. They don't know about implementation changes for a particular instance, so the only way such functions can safely be defined is to know every possible implementation. But if your return type is really polymorphic, that's not possible, because polymorphism is an "open" concept (it allows new implementation varieties to be added any time later).
Instead, Haskell has a very convenient and totally safe mechanism of describing a closed set of "instances" – you've actually used it yourself already! ADTs.
data PolyVector = NumbersVector (Vector [Int])
| CharsVector (Vector [Char])
| StringsVector (Vector [String])
That's the return type you want. The function won't be polymorphic as such, it'll simply return a more versatile type.
If you insist it should be polymorphic
Now... actually, Haskell does have a way to sort-of deal with "polymorphic returns". As in OO when you declare that you return a subclass of a specified class. Well, you can't "return a class" at all in Haskell, you can only return types. But those can be made to express "any instance of...". It's called existential quantification.
{-# LANGUAGE GADTs #-}
data PolyVector' where
PolyVector :: YourVElemClass e => Vector [e] -> PolyVector'
class YourVElemClass where
...?
instance YourVElemClass Int
instance YourVElemClass Char
instance YourVElemClass String
I don't know if that looks intriguing to you. Truth is, it's much more complicated and rather harder to use; you can't just just any of the possible results directly but can only make use of the elements through methods of YourVElemClass. GADTs can in some applications be extremely useful, but these usually involve classes with very deep mathematical motivation. YourVElemClass doesn't seem to have such a motivation, so you'll be much better off with a simple ADT alternative, than existential quantification.
There's a famous rant against existentials by Luke Palmer (note he uses another syntax, existential-specific, which I consider obsolete, as GADTs are strictly more general).
Easy, use an sum type!
data ParsedVector = NumberVector (Vector [Int]) | CharacterVector (Vector [Char]) | TextString (Vector [String]) deriving (Show)
parse :: [Char] -> ParsedVector
parse cs = case classify cs of
Number -> NumberVector $ toVectorNumber cs
Character -> CharacterVector $ toVectorChar cs
TextString -> TextStringVector $ toVectorString cs

Trouble with the State Monad

I am trying to write a program to generate 'word chains', e.g. bat -> cat -> cot -> bot, using the list monad (mostly comprehensions) to generate combinations of words, and the state monad to build up the actual chain as i go through the possibilities. That second part is giving me trouble:
import Control.Monad.State
type Word = String
type Chain = [Word]
getNext :: Word -> State Chain Word
getNext word = do
list <- get
return (list ++ "current word")
The part where I generate words works and is given below, but as you can see I don't really know what I'm doing in this part. Basically wordVariations :: Word -> [Word] takes a Word and returns a list of Words that differ in one letter from the given word. I'm trying to change this so that each word has a state signifying its predecessors:
For example: input = "cat". the final value is "got", the final state is ["cat","cot","got"]
What I have now will give me "got" from "cat" after 3 steps, but won't tell me how it got there.
None of the State Monad tutorials I found online were terribly helpful. The above code, when compiled with GHC, gives the error:
WordChain.hs:42:11:
Couldn't match type `Word' with `Char'
When using functional dependencies to combine
MonadState s (StateT s m),
arising from the dependency `m -> s'
in the instance declaration in `Control.Monad.State.Class'
MonadState [Char] (StateT Chain Data.Functor.Identity.Identity),
arising from a use of `get' at WordChain.hs:42:11-13
In a stmt of a 'do' block: list <- get
In the expression:
do { list <- get;
return (list ++ "current word") }
Failed, modules loaded: none.
This is just meant to be a test to work off of, but I can't figure it out!
The code in full is below in case it is helpful. I know this may not be the smartest way to do this, but it is a good opportunity to learn about the state monad. I am open to necessary changes in the way the code works also, because I suspect that some major refactoring will be called for:
import Control.Monad.State
type Word = String
type Dict = [String]
-- data Chain = Chain [Word] deriving (Show)
replaceAtIndex :: Int -> a -> [a] -> [a]
replaceAtIndex n item ls = a ++ (item:b) where (a, (_:b)) = splitAt n ls
tryLetter str c = [replaceAtIndex n c str | n <- [0..(length str - 1)]]
wordVariations str = tryLetter str =<< ['a' .. 'z']
testDict :: Dict
testDict = ["cat","cog","cot","dog"]
------- implement state to record chain
type Chain = [Word] -- [cat,cot,got,tot], etc. state var.
startingState = [""] :: Chain
getNext :: Word -> State Chain Word
getNext w = do
list <- get
return ( list ++ "current word")
First of all, the error you posted is in this line return ( list ++ "current word").
"current word" type is Word, which is an alias for String, which is an alias for[Char]`.
The variable list has a type of Chain, which is an alias for [Word], which is an alias for [[Char]].
The type signature of the function forces the return type must be a Word.
++ requires that the types on both sides be a list with the same type, (++) :: [a] -> [a] -> [a].
However, if you plug in the above type signatures, you get the type [Word] -> [Char] -> [Char] which has mismatched "a"s.
Quick but fairly important performance side note: prepending to a list is much faster then appending, so you might want to consider building them backwards and using (:) and reversing them at the end.
The State Monad is not really the right choice for storing the steps used to get to the result. At least it is certainly overkill, when the List Monad would be sufficient to complete the task. Consider:
-- given a list of words, find all possible subsequent lists of words
getNext :: [String] -> [[String]]
getNext words#(newest:_) = fmap (:words) (wordVariations newest)
-- lazily construct all chains of every length for every word
wordChains :: String -> [[[String]]]
wordChains word = chain
where chain = [[word]] : map (>>= getNext) chain
-- all the 5 word long chains starting with the word "bat"
batchains = wordChains "bat" !! 4
(Disclaimer: code compiled, but not run).
getNext takes a Chain, and returns a list containing a list of Chains, where each one has a different prepended successor in the chain. Since they share a common tail, this is memory efficient.
In wordChains, by repeatedly mapping using the list monad with (>>= getNext), you end up with a lazy infinite list, where the zeroth item is a Chain the starting word, the first item is all 2 item Chains where the first is the starting word, the second item is all 3 item chains, and so on. If you just want one chain of length 5, you can grab the head of the 4th item, and it will do just enough computation to create that for you, but you can also get all of them.
Also, getNext could be expanded to not repeat words by filtering.
Finally, when it comes to finding wordVariations, another algorithm would be to construct a filter which returns True if the lengths of two words are the same and the number of characters that are different between them is exactly 1. Then you could filter over a dictionary, instead of trying every possible letter variation.
Good luck!

Resources