Parsec permutation parsing - haskell

I wrote such permutation parsing example:
data Entry = Entry {
first_name :: String
, last_name :: String
, date_of_birth :: Maybe String
, nationality :: Maybe String
, parentage :: Maybe String
} deriving (Show)
nameParser :: Parser (String, String)
nameParser = do
first_name <- many1 upper
endOfLine
last_name <- many1 letter
endOfLine
return $ (first_name, last_name)
attributeParser :: String -> Parser String
attributeParser field = do
string $ field ++ ": "
value <- many1 (noneOf "\n")
endOfLine
return value
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
(d, n, p) <- permute ((,,)
<$?> (Nothing, liftM Just (try $ attributeParser "Date of Birth"))
<|?> (Nothing, liftM Just (try $ attributeParser "Nationality"))
<|?> (Nothing, liftM Just (try $ attributeParser "Parentage"))
)
return $ Entry f l d n p
main = do
mapM_ putStrLn . map (show . parse entryParser "") $ goodTests
goodTests =
"AAKVAAG\nTorvild\nDate of Birth: 1 July\nNationality: Norwegian\nParentage: business executive\n" :
"AAKVAAG\nTorvild\nNationality: Norwegian\nParentage: business executive\n" :
"AAKVAAG\nTorvild\nParentage: business executive\nNationality: Norwegian\n" :
"AAKVAAG\nTorvild\nParentage: business executive\n" :
"AAKVAAG\nTorvild\nNationality: Norwegian\n" : []
It would be good to extend Entry data with new fields in future, but doing that will require to put even more repetitive code in entryParser function. Is there a way to make this function accept list of parsers?
I started with this:
attributeParsers =
map attributeParser ["Date of Birth", "Nationality", "Parentage"]
permuteParams =
map (\p -> (Nothing, liftM Just (try p))) attributeParsers
But could not come of with correct way to fold permuteParams together with <|?> operator (I guess it would require something smarter than (,,) tuple constructor then).

As a first step, you can abstract the stuff you do for every component:
attr txt = (Nothing, liftM Just (try $ attributeParser txt))
With this, you can go to:
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
(d, n, p) <- permute ((,,)
<$?> attr "Date of Birth"
<|?> attr "Nationality"
<|?> attr "Parentage"
)
return $ Entry f l d n p
Then, if you want, you can combine the infix combinators and the attr calls:
f .$ x = f <$?> attr x
f .| x = f <|?> attr x
infixl 2 .$
infixl 2 .|
This gives you:
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
(d, n, p) <- permute ((,,)
.$ "Date of Birth"
.| "Nationality"
.| "Parentage"
)
return $ Entry f l d n p
Then you can further simplify by getting rid of the intermediate triple. All you're doing is to build it and then apply its components to Entry f l, so you can as well apply the result of the permutation parser to Entry f l directly:
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
permute (Entry f l
.$ "Date of Birth"
.| "Nationality"
.| "Parentage"
)
I think this is compact enough. If you really want some kind of fold, you'll either have to introduce an intermediate list and collect the permutation results in a list. This, however, only works as long as all the permutable attributes are of the same type (they currently are), and is not so nice because you'll make assumptions about the number of elements in this list. Or you'll have to use a heterogeneous list / some type class magic, which will lead to more complexity with the types and is, I think, not worth it here.

(<|?>) does not play nicely with folding because the type of the StreamPermParser you pass as its first argument isn't the same than that of the StreamPermParser result. For a simpler yet analogous issue, you would run into similar problems if you were trying to use (,,) with (<$>) and (<*>) in applicative style (e.g. (,,) <$> foo <*> bar <*> baz).
If you want to cut down some of the repetition, my prosaic suggestion would be using a local definition:
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
(d, n, p) <- permute ((,,)
<$?> optField "Date of Birth"
<|?> optField "Nationality"
<|?> optField "Parentage"
)
return $ Entry f l d n p
where
optField fieldName = (Nothing, liftM Just (try $ attributeParser fieldName))

Related

How far does "try" back track?

So ... I messed up a recording in CSV format:
23,95489,0,20,9888
Due to language settings floating point numbers were written with commas as seperator ... in a comma separated value file ...
Problem is that the file does not have a nice formatting for every float. Some have no point at all and the number of numbers behind the point varies too.
My idea was to build a MegaParsec parser that would try to read every possible floating point formatting, move on and if back track if it finds an error.
Eg for the example above:
read 23,95489 -> good
read 0,20 -> good (so far)
read 9888 -> error (because value is too high for column (checked by guard))
(back tracking to 2.) read 0 -> good again
read 20,9888 -> good
done
I've implemented that as (pseudo code here):
floatP = try pointyFloatP <|> unpointyFloatP
lineP = (,,) <$> floatP <* comma <*> floatP <* comma <*> floatP <* comma
My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?
And if so ... how would I go about implementing further back tracking?
How far does “try” back track?
The parser try p consumes exactly as much input as p if p parses successfully, otherwise it does not consume any input at all. So if you look at that in terms of backtracking, it backtracks to the point where you were when you invoked it.
My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?
Yes, try does not "unconsume" input. All it does is to recover from a failure in the parser you give it without consuming any input. It does not undo the effects of any parsers that you've applied previously, nor does it affect subsequent parsers that you apply after try p succeeded.
And if so ... how would I go about implementing further back tracking?
Basically what you want is to not only know whether pointyFloatP succeeds on the current input, but also whether the rest of your lineP would succeed after successfully pointyFloatP - and if it doesn't you want to backtrack back to before you applied pointyFloatP. So basically you want the parser for the whole remaining line in the try, not just the float parser.
To achieve that you can make floatP take the parser for the remaining line as an argument like this:
floatP restP = try (pointyFloatP <*> restP) <|> unpointyFloatP <*> restP
Note that this kind of backtracking isn't going to be very efficient (but I assume you knew that going in).
Update: Include a custom monadic parser for more complex rows.
Using the List Monad for Simple Parsing
The list monad makes a better backtracking "parser" than Megaparsec. For example, to parse the cells:
row :: [String]
row = ["23", "95489", "0", "20", "9888"]
into exactly three columns of values satisfying a particular bound (e.g., less than 30), you can generate all possible parses with:
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Control.Applicative
rowResults :: [String] -> [[Double]]
rowResults = cols 3
where cols :: Int -> [String] -> [[Double]]
cols 0 [] = pure [] -- good, finished on time
cols 0 _ = empty -- bad, didn't use all the data
-- otherwise, parse exactly #n# columns from cells #xs#
cols n xs = do
-- form #d# from one or two cells
(d, ys) <- num1 xs <|> num2 xs
-- only accept #d < 30#
guard $ d < 30
ds <- cols (n-1) ys
return $ d : ds
-- read number from a single cell
num1 (x:xs) | ok1 x = pure (read x, xs)
num1 _ = empty
-- read number from two cells
num2 (x:y:zs) | ok1 x && ok2 y = pure (read (x ++ "." ++ y), zs)
num2 _ = empty
-- first cell: "0" is okay, but otherwise can't start with "0"
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- second cell: can't end with "0" (or *be* "0")
ok2 xs = last xs /= '0'
The above list-based parser tries to reduce ambiguity by assuming that if "xxx,yyy" is a number, the "xxx" won't start with zeros (unless it's just "0"), and the "yyy" won't end with a zero (or, for that matter, be a single "0"). If this isn't right, just modify ok1 and ok2 as appropriate.
Applied to row, this gives the single unambiguous parse:
> rowResults row
[[23.95489,0.0,20.9888]]
Applied to an ambiguous row, it gives all parses:
> rowResults ["0", "12", "5", "0", "8601"]
[[0.0,12.5,0.8601],[0.0,12.5,0.8601],[0.12,5.0,0.8601]]
Anyway, I'd suggest using a standard CSV parser to parse your file into a matrix of String cells like so:
dat :: [[String]]
dat = [ ["23", "95489", "0", "20", "9888"]
, ["0", "12", "5", "0", "8601"]
, ["23", "2611", "2", "233", "14", "422"]
]
and then use rowResults above get the row numbers of rows that were ambiguous:
> map fst . filter ((>1) . snd) . zip [1..] . map (length . rowResults) $ dat
[2]
>
or unparsable:
> map fst . filter ((==0) . snd) . zip [1..] . map (length . rowResults) $ dat
[]
>
Assuming there are no unparsable rows, you can regenerate one possible fixed file, even if some rows are ambiguous, but just grabbing the first successful parse for each row:
> putStr $ unlines . map (intercalate "," . map show . head . rowResults) $ dat
23.95489,0.0,20.9888
0.0,12.5,0.8601
23.2611,2.233,14.422
>
Using a Custom Monad based on the List Monad for More Complex Parsing
For more complex parsing, for example if you wanted to parse a row like:
type Stream = [String]
row0 :: Stream
row0 = ["Apple", "15", "1", "5016", "2", "5", "3", "1801", "11/13/2018", "X101"]
with a mixture of strings and numbers, it's actually not that difficult to write a monadic parser, based on the list monad, that generates all possible parses.
The key idea is to define a parser as a function that takes a stream and generates a list of possible parses, with each possible parse represented as a tuple of the object successfully parsed from the beginning of the stream paired with the remainder of the stream. Wrapped in a newtype, our parallel parser would look like:
newtype PParser a = PParser (Stream -> [(a, Stream)]) deriving (Functor)
Note the similarity to the type ReadS from Text.ParserCombinators.ReadP, which is also technically an "all possible parses" parser (though you usually only expect one, unambiguous parse back from a reads call):
type ReadS a = String -> [(a, String)]
Anyway, we can define a Monad instance for PParser like so:
instance Applicative PParser where
pure x = PParser (\s -> [(x, s)])
(<*>) = ap
instance Monad PParser where
PParser p >>= f = PParser $ \s1 -> do -- in list monad
(x, s2) <- p s1
let PParser q = f x
(y, s3) <- q s2
return (y, s3)
There's nothing too tricky here: pure x returns a single possible parse, namely the result x with an unchanged stream s, while p >>= f applies the first parser p to generate a list of possible parses, takes them one by one within the list monad to calculate the next parser q to use (which, as per usual for a monadic operation, can depend on the result of the first parse), and generates a list of possible final parses that are returned.
The Alternative and MonadPlus instances are pretty straightforward -- they just lift emptiness and alternation from the list monad:
instance Alternative PParser where
empty = PParser (const empty)
PParser p <|> PParser q = PParser $ \s -> p s <|> q s
instance MonadPlus PParser where
To run our parser, we have:
parse :: PParser a -> Stream -> [a]
parse (PParser p) s = map fst (p s)
and now we can introduce primitives:
-- read a token as-is
token :: PParser String
token = PParser $ \s -> case s of
(x:xs) -> pure (x, xs)
_ -> empty
-- require an end of stream
eof :: PParser ()
eof = PParser $ \s -> case s of
[] -> pure ((), s)
_ -> empty
and combinators:
-- combinator to convert a String to any readable type
convert :: (Read a) => PParser String -> PParser a
convert (PParser p) = PParser $ \s1 -> do
(x, s2) <- p s1 -- for each possible String
(y, "") <- reads x -- get each possible full read
-- (normally only one)
return (y, s2)
and parsers for various "terms" in our CSV row:
-- read a string from a single cell
str :: PParser String
str = token
-- read an integer (any size) from a single cell
int :: PParser Int
int = convert (mfilter ok1 token)
-- read a double from one or two cells
dbl :: PParser Double
dbl = dbl1 <|> dbl2
where dbl1 = convert (mfilter ok1 token)
dbl2 = convert $ do
t1 <- mfilter ok1 token
t2 <- mfilter ok2 token
return $ t1 ++ "." ++ t2
-- read a double that's < 30
dbl30 :: PParser Double
dbl30 = do
x <- dbl
guard $ x < 30
return x
-- rules for first cell of numbers:
-- "0" is okay, but otherwise can't start with "0"
ok1 :: String -> Bool
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- rules for second cell of numbers:
-- can't be "0" or end in "0"
ok2 :: String -> Bool
ok2 xs = last xs /= '0'
Then, for a particular row schema, we can write a row parser as we normally would with a monadic parser:
-- a row
data Row = Row String Int Double Double Double
Int String String deriving (Show)
rowResults :: PParser Row
rowResults = Row <$> str <*> int <*> dbl30 <*> dbl30 <*> dbl30
<*> int <*> str <*> str <* eof
and get all possible parses:
> parse rowResults row0
[Row "Apple" 15 1.5016 2.0 5.3 1801 "11/13/2018" "X101"
,Row "Apple" 15 1.5016 2.5 3.0 1801 "11/13/2018" "X101"]
>
The full program is:
{-# LANGUAGE DeriveFunctor #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Control.Applicative
type Stream = [String]
newtype PParser a = PParser (Stream -> [(a, Stream)]) deriving (Functor)
instance Applicative PParser where
pure x = PParser (\s -> [(x, s)])
(<*>) = ap
instance Monad PParser where
PParser p >>= f = PParser $ \s1 -> do -- in list monad
(x, s2) <- p s1
let PParser q = f x
(y, s3) <- q s2
return (y, s3)
instance Alternative PParser where
empty = PParser (const empty)
PParser p <|> PParser q = PParser $ \s -> p s <|> q s
instance MonadPlus PParser where
parse :: PParser a -> Stream -> [a]
parse (PParser p) s = map fst (p s)
-- read a token as-is
token :: PParser String
token = PParser $ \s -> case s of
(x:xs) -> pure (x, xs)
_ -> empty
-- require an end of stream
eof :: PParser ()
eof = PParser $ \s -> case s of
[] -> pure ((), s)
_ -> empty
-- combinator to convert a String to any readable type
convert :: (Read a) => PParser String -> PParser a
convert (PParser p) = PParser $ \s1 -> do
(x, s2) <- p s1 -- for each possible String
(y, "") <- reads x -- get each possible full read
-- (normally only one)
return (y, s2)
-- read a string from a single cell
str :: PParser String
str = token
-- read an integer (any size) from a single cell
int :: PParser Int
int = convert (mfilter ok1 token)
-- read a double from one or two cells
dbl :: PParser Double
dbl = dbl1 <|> dbl2
where dbl1 = convert (mfilter ok1 token)
dbl2 = convert $ do
t1 <- mfilter ok1 token
t2 <- mfilter ok2 token
return $ t1 ++ "." ++ t2
-- read a double that's < 30
dbl30 :: PParser Double
dbl30 = do
x <- dbl
guard $ x < 30
return x
-- rules for first cell of numbers:
-- "0" is okay, but otherwise can't start with "0"
ok1 :: String -> Bool
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- rules for second cell of numbers:
-- can't be "0" or end in "0"
ok2 :: String -> Bool
ok2 xs = last xs /= '0'
-- a row
data Row = Row String Int Double Double Double
Int String String deriving (Show)
rowResults :: PParser Row
rowResults = Row <$> str <*> int <*> dbl30 <*> dbl30 <*> dbl30
<*> int <*> str <*> str <* eof
row0 :: Stream
row0 = ["Apple", "15", "1", "5016", "2", "5", "3", "1801", "11/13/2018", "X101"]
main = print $ parse rowResults row0
Off-the-shelf Solutions
I find it a little surprising I can't find an existing parser library out there that provides this kind of "all possible parses" parser. The stuff in Text.ParserCombinators.ReadP takes the right approach, but it assumes that you're parsing characters from a String rather than arbitrary tokens from some other stream (in our case, Strings from a [String]).
Maybe someone else can point out an off-the-shelf solution that would save you from having to role your own parser type, instances, and primitives.

Reading multiline user's input

I want to lazily read user input and do something with it line by line. But if user ends a line with , (comma) followed by any number of spaces (including zero), I want give him opportunity to finish his input on the next line.
And here is what I've got:
import System.IO
import Data.Char
chop :: String -> [String]
chop = f . map (++ "\n") . lines
where f [] = []
f [x] = [x]
f (x : y : xs) = if (p . tr) x
then f ((x ++ y) : xs)
else x : f (y : xs)
p x = (not . null) x && ((== ',') . last) x
tr xs | all isSpace xs = ""
tr (x : xs) = x :tr xs
main :: IO ()
main =
do putStrLn "Welcome to hell, version 0.1.3!"
putPrompt
mapM_ process . takeWhile (/= "quit\n") . chop =<< getContents
where process str = putStr str >> putPrompt
putPrompt = putStr ">>> " >> hFlush stdout
Sorry, it doesn't work at all. Bloody mess.
P.S. I want to preserve \n characters on end of every chunk. Currently I add them manually with map (++ "\n") after lines.
How about changing the type of chop a little:
readMultiLine :: IO [String]
readMultiLine = do
ln <- getLine
if (endswith (rstrip ln) ",") then
liftM (ln:) readMultiLine
else
return [ln]
Now you know that if the last list is not empty, then the user didn't finish typing (the last input ended with ',').
Of course, either import Data.String.Utils, or write your own. Could be as simple as:
endswith xs ys = (length xs >= length ys)
&& (and $ zipWith (==) (reverse xs) (reverse ys))
rstrip = reverse . dropWhile isSpace . reverse
But I missed the point at first. Here's the actual thing.
unfoldM :: (Monad m) => (a -> Maybe (m b, m a)) -> a -> m [b]
unfoldM f z = case f z of
Nothing -> return []
Just (x, y) -> liftM2 (:) x $ y >>= unfoldM f
main = unfoldM (\x -> if (x == ["quit"]) then Nothing
else Just (print x, readMultiLine)) =<< readMultiLine
The reason is, you need to be able to insert the "action" to be done on input between reading one multi-line input and the next. Here print x is the action inserted between two readMultiLine
Since you have questions about getContents, let me add. Even though getContents provides a lazy String, its effectful changes to the world are ordered with the subsequent effects of processing the list. But the processing of the list attempts to insert effects between effects of reading particular list items. To do that, you need a function that exposes the chain of effects, so you can insert your own effects between them.
You can do this using pipes, preserving the laziness of the user's input
import Data.Char (isSpace)
import Pipes
import qualified Pipes.Prelude as Pipes
endsWithComma :: String -> Bool
endsWithComma str =
case (dropWhile isSpace $ reverse str) of
',':_ -> True
_ -> False
finish :: Monad m => Pipe String String m ()
finish = do
str <- await
yield str
if endsWithComma str
then do
str' <- await
yield str'
else finish
user :: Producer String IO ()
user = Pipes.stdinLn >-> finish
You can then hook up the user Producer to any downstream Consumer. For example, to echo the stream back out you can write:
main = runEffect (user >-> Pipes.stdoutLn)
To learn more about pipes you can read the tutorial.
Sorry, I wrote something wrong in a comment and I thought that now that I understood what you were trying to do, I'd give an answer with a little more substance. The core idea is that you're going to need a state buffer while you loop through the string, as far as I can tell. You have f :: [String] -> [String] but you'll need an extra string of buffer before you can solve this puzzle.
So let me assume an answer which looks like:
chop = joinCommas "" . map (++ "\n") . lines
Then the structure of joinCommas is going to look like:
import Data.List (isSuffixOf)
-- override with however you want to handle the ",\n" between lines.
joinLines = (++)
incomplete = isSuffixOf ",\n"
joinCommas :: String -> [String] -> [String]
joinCommas prefix (line : rest)
| incomplete prefix = joinCommas (joinLines prefix line) rest
| otherwise = prefix : joinCommas line rest
joinCommas prefix []
| incomplete prefix = error "Incomplete input"
| otherwise = [prefix]
The prefix stores up lines until it doesn't end with ",\n" at which point it emits the prefix and continues with the rest of the lines. On EOF we process the last line unless that line is incomplete.

Non-exhaustive patterns in lambda

I am getting Non-exhaustive patterns in lambda. I am not sure of the cause yet. Please anyone how to fix it. The code is below:
import Control.Monad
import Data.List
time_spent h1 h2 = max (abs (fst h1 - fst h2)) (abs (snd h1 - snd h2))
meeting_point xs = foldl' (find_min_time) maxBound xs
where
time_to_point p = foldl' (\tacc p' -> tacc + (time_spent p p')) 0 xs
find_min_time min_time p = let x = time_to_point p in if x < min_time then x else min_time
main = do
n <- readLn :: IO Int
points <- fmap (map (\[x,y] -> (x,y)) . map (map (read :: String->Int)) . map words . lines) getContents
putStrLn $ show $ meeting_point points
This is the lambda with the non-exhaustive patterns: \[x,y] -> (x,y).
The non-exhaustive pattern is because the argument you've specified, [x,y] doesn't match any possible list - it only matches lists with precisely two elements.
I would suggest replacing it with a separate function with an error case to print out the unexpected data in an error message so you can debug further, e.g.:
f [x,y] = (x, y)
f l = error $ "Unexpected list: " ++ show l
...
points <- fmap (map f . map ...)
As an addition to #GaneshSittampalam's answer, you could also do this with more graceful error handling using the Maybe monad, the mapM function from Control.Monad, and readMaybe from Text.Read. I would also recommend refactoring your code so that the parsing is its own function, it makes your main function much cleaner and easier to debug.
import Control.Monad (mapM)
import Text.Read (readMaybe)
toPoint :: [a] -> Maybe (a, a)
toPoint [x, y] = Just (x, y)
toPoint _ = Nothing
This is just a simple pattern matching function that returns Nothing if it gets a list with length not 2. Otherwise it turns it into a 2-tuple and wraps it in Just.
parseData :: String -> Maybe [(Int, Int)]
parseData text = do
-- returns Nothing if a non-Int is encountered
values <- mapM (mapM readMaybe . words) . lines $ text
-- returns Nothing if a line doesn't have exactly 2 values
mapM toPoint values
Your parsing can actually be simplified significantly by using mapM and readMaybe. The type of readMaybe is Read a => String -> Maybe a, and in this case since we've specified the type of parseData to return Maybe [(Int, Int)], the compiler can infer that readMaybe should have the local type of String -> Maybe Int. We still use lines and words in the same way, but now since we use mapM the type of the right hand side of the <- is Maybe [[Int]], so the type of values is [[Int]]. What mapM also does for us is if any of those actions fails, the overall computation exits early with Nothing. Then we simply use mapM toPoint to convert values into a list of points, but also with the failure mechanism built in. We actually could use the more general signature of parseData :: Read a => String -> Maybe [(a, a)], but it isn't necessary.
main = do
n <- readLn :: IO Int
points <- fmap parseData getContents
case points of
Just ps -> print $ meeting_point ps
Nothing -> putStrLn "Invalid data!"
Now we just use fmap parseData on getContents, making points have the type Maybe [(Int, Int)]. Finally, we pattern match on points to print out the result of the meeting_point computation or print a helpful message if something went wrong.
If you wanted even better error handling, you could leverage the Either monad in a similar fashion:
toPoint :: [a] -> Either String (a, a)
toPoint [x, y] = Right (x, y)
toPoint _ = Left "Invalid number of points"
readEither :: Read a => String -> Either String a
readEither text = maybe (Left $ "Invalid parse: " ++ text) Right $ readMaybe text
-- default value ^ Wraps output on success ^
-- Same definition with different type signature and `readEither`
parseData :: String -> Either String [(Int, Int)]
parseData text = do
values <- mapM (mapM readEither . words) . lines $ text
mapM toPoint values
main = do
points <- fmap parseData getContents
case points of
Right ps -> print $ meeting_point ps
Left err -> putStrLn $ "Error: " ++ err

Parsec permutation parser with separators

I want to parse assembly programs. I have a fixed format for parsing an assembly address: [ register + offset + label ] I implemented parsers for registers, offsets and labels. Now I want to create a parser which parses the whole address.
The combinations I want to accept:
[register]
[offset]
[label]
[register + offset]
[register + label]
[offset + label]
[register + offset + label]
And what I don't want to accept:
[]
[register offset]
[register + ]
...
Of course the simple solution is to have something like:
choice $ try (parseRegister >>= \r -> Address (Just r) Nothing Nothing)
<|> try ...
But it is ugly and does not scale well with more types of elements. So I'm looking for a cleaner solution.
If you reorder your table, you see it’s a series of choices:
[register + offset + label]
[register + offset ]
[register + label]
[register ]
[ offset + label]
[ offset ]
[ label]
The grammar for which might be written:
address = '[' (register ('+' offset-label)? | offset-label) ']'
offset-label = offset ('+' label)? | label
Which in Applicative style is pretty straightforward, made only slightly noisy by wrapping everything in constructors:
parseAddress :: Parser Address
parseAddress = do
(register, (offset, label)) <- between (char '[') (char ']') parseRegisterOffsetLabel
return $ Address register offset label
parseRegisterOffsetLabel :: Parser (Maybe Register, (Maybe Offset, Maybe Label))
parseRegisterOffsetLabel = choice
[ (,)
<$> (Just <$> parseRegister)
<*> option (Nothing, Nothing) (char '+' *> parseOffsetLabel)
, (,) Nothing <$> parseOffsetLabel
]
parseOffsetLabel :: Parser (Maybe Offset, Maybe Label)
parseOffsetLabel = choice
[ (,)
<$> (Just <$> parseOffset)
<*> option Nothing (char '+' *> (Just <$> parseLabel))
, (,) Nothing . Just <$> parseLabel
]
If we add a couple of utility functions:
plus :: Parser a -> Parser a
plus x = char '+' *> x
just :: Parser a -> Parser (Maybe a)
just = fmap Just
We can clean up these implementations a bit:
parseRegisterOffsetLabel = choice
[ (,)
<$> just parseRegister
<*> option (Nothing, Nothing) (plus parseOffsetLabel)
, (,) Nothing <$> parseOffsetLabel
]
parseOffsetLabel = choice
[ (,)
<$> just parseOffset
<*> option Nothing (plus (just parseLabel))
, (,) Nothing <$> just parseLabel
]
Then factor out the repetition, giving us a decent final solution:
parseChain begin def rest = choice
[ (,) <$> just begin <*> option def (plus rest)
, (,) Nothing <$> rest
]
parseRegisterOffsetLabel = parseChain
parseRegister (Nothing, Nothing) parseOffsetLabel
parseOffsetLabel = parseChain
parseOffset Nothing (just parseLabel)
I’ll let you take care of whitespace around + and inside [].
Something like that:
parsePlus = many1 (char ' ') >> char '+' >> many1 (char ' ')
parseRegisterModified = parsePlus >> parseOffsetLabel
parseOffsetModified = parsePlus >> parseLabel
parseRegister' = do
Address r _ _ <- parseRegister
optionMaybe parseRegisterModified >>=
return $ maybe
(Address r Nothing Nothing)
(\Address _ o l -> Address r o l)
parseOffset' = do
Address _ o _ <- parseOffset
optionMaybe parseOffsetModified >>=
return $ maybe
(Address Nothing o Nothing)
(\Address _ _ l -> Address Nothing o l)
parseOffsetLabel = try parseOffset' <|> parseLabel
parseAddress =
try parseRegister'
<|> parseOffset'
<|> parseLabel
I've been looking for something like that and found
Control.Applicative.Permutation from action-permutations. Though my case may scale independently from low-level platform.
In your case might look like
operand = do
(r, o, l) <- runPermsSep (char '+') $ (,,)
<$> maybeAtom register
<*> maybeAtom offset
<*> maybeAtom label
-- backtrack on inappropriate combination
when (null $ catMaybes [r, o, l]) . fail $ "operand expected"
return (r, o, l)
Note that you actually want optional permutation parser that requires at least one optional element to be present which makes your wanted parsers combinator pretty specific.
You could have more elegant solution using Monoids and sepBy1.
But it allows to write [register + register] (in our case adding them both)
parsePlus = many1 (char ' ') >> char '+' >> many1 (char ' ')
parseAddress1 =
try parseRegister
<|> parseOffset
<|> parseLabel
parseAddress = sepBy1 parsePlus parseAddress1 >>= return . mconcat
instance Monoid Address where
mempty = Address Nothing Nothing Nothing
Address r o l `mappend` Address r' o' l' =
Address (r `mappendA` r') (o `mappendA` o') (l `mappendA` l')
where
a `mappendA` a' = fmap getSum $ fmap Sum a `mappend` fmap Sum a'
Choosing Monoid (Sum a, First a, Last a) for r o l, we change the behavior:
Sum adds each other, First chooses first one, Last chooses the last one
... where
a `mappendA` a' = getFirst $ First a `mappend` First a'

awkward monad transformer stack

Solving a problem from Google Code Jam (2009.1A.A: "Multi-base happiness") I came up with an awkward (code-wise) solution, and I'm interested in how it could be improved.
The problem description, shortly, is: Find the smallest number bigger than 1 for which iteratively calculating the sum of squares of digits reaches 1, for all bases from a given list.
Or description in pseudo-Haskell (code that would solve it if elem could always work for infinite lists):
solution =
head . (`filter` [2..]) .
all ((1 `elem`) . (`iterate` i) . sumSquareOfDigitsInBase)
And my awkward solution:
By awkward I mean it has this kind of code: happy <- lift . lift . lift $ isHappy Set.empty base cur
I memoize results of the isHappy function. Using the State monad for the memoized results Map.
Trying to find the first solution, I did not use head and filter (like the pseudo-haskell above does), because the computation isn't pure (changes state). So I iterated by using StateT with a counter, and a MaybeT to terminate the computation when condition holds.
Already inside a MaybeT (StateT a (State b)), if the condition doesn't hold for one base, there is no need to check the other ones, so I have another MaybeT in the stack for that.
Code:
import Control.Monad.Maybe
import Control.Monad.State
import Data.Maybe
import qualified Data.Map as Map
import qualified Data.Set as Set
type IsHappyMemo = State (Map.Map (Integer, Integer) Bool)
isHappy :: Set.Set Integer -> Integer -> Integer -> IsHappyMemo Bool
isHappy _ _ 1 = return True
isHappy path base num = do
memo <- get
case Map.lookup (base, num) memo of
Just r -> return r
Nothing -> do
r <- calc
when (num < 1000) . modify $ Map.insert (base, num) r
return r
where
calc
| num `Set.member` path = return False
| otherwise = isHappy (Set.insert num path) base nxt
nxt =
sum . map ((^ (2::Int)) . (`mod` base)) .
takeWhile (not . (== 0)) . iterate (`div` base) $ num
solve1 :: [Integer] -> IsHappyMemo Integer
solve1 bases =
fmap snd .
(`runStateT` 2) .
runMaybeT .
forever $ do
(`when` mzero) . isJust =<<
runMaybeT (mapM_ f bases)
lift $ modify (+ 1)
where
f base = do
cur <- lift . lift $ get
happy <- lift . lift . lift $ isHappy Set.empty base cur
unless happy mzero
solve :: [String] -> String
solve =
concat .
(`evalState` Map.empty) .
mapM f .
zip [1 :: Integer ..]
where
f (idx, prob) = do
s <- solve1 . map read . words $ prob
return $ "Case #" ++ show idx ++ ": " ++ show s ++ "\n"
main :: IO ()
main =
getContents >>=
putStr . solve . tail . lines
Other contestants using Haskell did have nicer solutions, but solved the problem differently. My question is about small iterative improvements to my code.
Your solution is certainly awkward in its use (and abuse) of monads:
It is usual to build monads piecemeal by stacking several transformers
It is less usual, but still happens sometimes, to stack several states
It is very unusual to stack several Maybe transformers
It is even more unusual to use MaybeT to interrupt a loop
Your code is a bit too pointless :
(`when` mzero) . isJust =<<
runMaybeT (mapM_ f bases)
instead of the easier to read
let isHappy = isJust $ runMaybeT (mapM_ f bases)
when isHappy mzero
Focusing now on function solve1, let us simplify it.
An easy way to do so is to remove the inner MaybeT monad. Instead of a forever loop which breaks when a happy number is found, you can go the other way around and recurse only if the
number is not happy.
Moreover, you don't really need the State monad either, do you ? One can always replace the state with an explicit argument.
Applying these ideas solve1 now looks much better:
solve1 :: [Integer] -> IsHappyMemo Integer
solve1 bases = go 2 where
go i = do happyBases <- mapM (\b -> isHappy Set.empty b i) bases
if and happyBases
then return i
else go (i+1)
I would be more han happy with that code.
The rest of your solution is fine.
One thing that bothers me is that you throw away the memo cache for every subproblem. Is there a reason for that?
solve :: [String] -> String
solve =
concat .
(`evalState` Map.empty) .
mapM f .
zip [1 :: Integer ..]
where
f (idx, prob) = do
s <- solve1 . map read . words $ prob
return $ "Case #" ++ show idx ++ ": " ++ show s ++ "\n"
Wouldn't your solution be more efficient if you reused it instead ?
solve :: [String] -> String
solve cases = (`evalState` Map.empty) $ do
solutions <- mapM f (zip [1 :: Integer ..] cases)
return (unlines solutions)
where
f (idx, prob) = do
s <- solve1 . map read . words $ prob
return $ "Case #" ++ show idx ++ ": " ++ show s
The Monad* classes exist to remove the need for repeated lifting. If you change your signatures like this:
type IsHappyMemo = Map.Map (Integer, Integer) Bool
isHappy :: MonadState IsHappyMemo m => Set.Set Integer -> Integer -> Integer -> m Bool
This way you can remove most of the 'lift's. However, the longest sequence of lifts cannot be removed, since it is a State monad inside a StateT, so using the MonadState type class will give you the outer StateT, where you need tot get to the inner State. You could wrap your State monad in a newtype and make a MonadHappy class, similar to the existing monad classes.
ListT (from the List package) does a much nicer job than MaybeT in stopping the calculation when necessary.
solve1 :: [Integer] -> IsHappyMemo Integer
solve1 bases = do
Cons result _ <- runList . filterL cond $ fromList [2..]
return result
where
cond num = andL . mapL (isHappy Set.empty num) $ fromList bases
Some elaboration on how this works:
Had we used a regular list the code would had looked like this:
solve1 bases = do
result:_ <- filterM cond [2..]
return result
where
cond num = fmap and . mapM (isHappy Set.empty num) bases
This calculation happens in a State monad, but if we'd like to get the resulting state, we'd have a problem, because filterM runs the monadic predicate it gets for every element of [2..], an infinite list.
With the monadic list, filterL cond (fromList [2..]) represents a list that we can access one item at a time as a monadic action, so our monadic predicate cond isn't actually executed (and affecting the state) unless we consume the corresponding list items.
Similarly, implementing cond using andL makes us not calculate and update the state if we already got a False result from one of the isHappy Set.empty num calculations.

Resources