Haskell: Elegant solution for integer parsing

Haskell: Elegant solution for integer parsing - string

I want to write a function that gets the first integer in a string, if the integer is at the beginning of the string and is followed by space or nothing.
For example, "12" and "12 kids" would be valid strings, but "12kids", "a12" are invalid.
This is my function:
getFirstInteger :: String -> Int
getFirstInteger [] = error "No integer"
getFirstInteger str
| dropWhile (Char.isNumber) str == [] = read str :: Int
| Char.isSpace $ head $ dropWhile (Char.isNumber) str = read (takeWhile (Char.isNumber) str) :: Int
| otherwise = error "No integer found"
The problem is that if the string is actually an integer, head will raise an exception, so that is why the first condition exists. Is there any elegant solution to this problem?

getFirstInteger :: String -> Int
getFirstInteger = read . head . words
words will split the String into a list of Strings, which were originally separated by whitespace. head will take the first (or error if the original string was empty), and read will parse the string as usual (and error if the first word wasn't a valid Int).
However, I prefer a variant that doesn't use error on empty Strings or unparseable ones, e.g.
import Text.Read (readMaybe)
getFirstInteger :: String -> Maybe Int
getFirstInteger [] = Nothing
getFirstInteger xs = readMaybe . head . words $ xs
One could write this completely point-free with listToMaybe from Data.Maybe, but that's probably an overkill:
import Data.Maybe (listToMaybe)
import Text.Read (readMaybe)
import Control.Monad ((>=>))
getFirstInteger :: String -> Maybe Int
getFirstInteger = listToMaybe . words >=> readMaybe

If you want to parse strings without using parser combinator libraries or any of the machinery in Text.Read, have a look at the functions break and span:
span :: (a -> Bool) -> [a] -> ([a], [a])
break :: (a -> Bool) -> [a] -> ([a], [a])
The nice thing is that both of these functions not only return what they match but also the remainder of the string allowing you to continue your parsing.
To solve your problem:
import Data.Char
getFirstInteger :: String -> Maybe Int
getFirstInteger str
let (digs, rest1) = span isDigit str
endsok = case rest1 of
[] -> True
(c:_) -> c == ' '
in
if not (null digs) && endsok
then Just (read digs)
else Nothing -- either no digits or doesn't end properly
This version does not allow a leading minus sign. This next version allows an optional leading minus sign for the integer:
getFirstInteger' str =
let (minus,rest1) = span (=='-') str
(digs, rest2) = span isDigit rest1
endsok = case rest2 of
[] -> True
(c:_) -> c == ' '
in
if length minus <= 1 && not (null digs) && endsok
then Just (read (minus ++ digs))
else Nothing
Yes - this does not terminate as early as possible on bad input. I provide it mainly as an example of how to chain together calls to span and break.

Use reads. For example:
type Unit = String
readUnit :: String -> Maybe (Int, Maybe Unit)
readUnit s = case reads s of -- the integer is at the beginning of the string and...
(n, ' ':unit):_ -> Just (n, Just unit) -- is followed by space...
(n, "" ):_ -> Just (n, Nothing) -- or nothing.
_ -> Nothing
In ghci:
> readUnit "12"
Just (12,Nothing)
> readUnit "12 kids"
Just (12,Just "kids")
> readUnit "12kids"
Nothing
> readUnit "a12"
Nothing
However, there are a few minor considerations to keep in mind. It's possible that read does not restrict the syntax as much as you might want; for example, the following answer may surprise you:
> readUnit " ((-0x5)) kids"
Just (-5,Just "kids")
You may also want to drop extraneous spaces in the unit; for example, you could change the first clause above to
(n, ' ':unit):_ -> Just (n, Just (dropWhile isSpace unit))
or similar. And as a final variation on this theme, note that while the standard instances of Read never return lists with more than one element from reads, it is technically possible that some user-supplied type may do so. So if you were ever to use reads to parse types other than Int, you may want to either demand an unambiguous parse or consider all the parses before deciding what to do; the above code bakes in the assumption that the first parse is as good as any.

Related

Monadic Parser - handling string with one character

I was reading this Monadic Parsing article while I was trying to implement a pretty simple string parser in Haskell and also get a better understanding of using monads. Down below you can see my code, implementing functions for matching a single character or a whole string. It works as expected, but I observed two strange behaviors that I can't explain.
I have to handle single characters in string, otherwise, the parser will return only empty lists. To be exact, if I remove this line string [c] = do char c; return [c] it won't work anymore. I was expecting that string (c:s) would handle string (c:[]) properly. What could be the cause here?
In my opinion, string definition should be equivalent to string s = mapM char s as it would create a list of [Parser Char] for each character in s and collect the results as Parser [Char]. If I use the definition based on mapM, the program would get stuck in an infinite loop and won't print anything. Is something about lazy evalutation that I miss here?
.
module Main where
newtype Parser a = Parser { apply :: String->[(a, String)] }
instance Monad Parser where
return a = Parser $ \s -> [(a, s)]
ma >>= k = Parser $ \s -> concat [apply (k a) s' | (a, s') <- apply ma s]
instance Applicative Parser where
pure = return
mf <*> ma = do { f <- mf; f <$> ma; }
instance Functor Parser where
fmap f ma = f <$> ma
empty :: Parser a
empty = Parser $ const []
anychar :: Parser Char
anychar = Parser f where
f [] = []
f (c:s) = [(c, s)]
satisfy :: (Char -> Bool) -> Parser Char
satisfy prop = do
c <- anychar
if prop c then return c
else empty
char :: Char -> Parser Char
char c = satisfy (== c)
string :: String -> Parser String
string [] = empty
string [c] = do char c; return [c] --- if I remove this line, all results will be []
string (c:s) = do char c; string s; return (c:s)
main = do
let s = "12345"
print $ apply (string "123") s
print $ apply (string "12") s
print $ apply (string "1") s
print $ apply (string []) s
PS. I think the title of the question is not suggestive enough, please propose an edit if you have a better idea.

Since you did string [] = empty instead of string [] = return [], you can't use it as a base case for recursion that builds up a list.
fmap f ma = f <$> ma is wrong, since <$> is defined in terms of fmap. If you want to define fmap in terms of your other instances, then do fmap = liftA or fmap = liftM. Since mapM uses fmap internally but your original string didn't, this problem didn't come up in your first simple test.

string [] = empty
means: "If you need to parse an empty string, fail -- it can not be parsed at all, no matter what's the input string".
By comparison,
string [] = return ""
means: "If you need to parse an empty string, succeed and return the empty string -- it can always be parsed, no matter what's the input string".
By using the first equation, when you recurse in the case string (c:cs) you need to stop at one character (string [c]) since reaching zero characters will run empty and make the whole parser fail.
Hence, you need to either use that string [c] = return [c] equation, or modify the base "empty string" case so that it succeeds. Arguably, the latter would be more natural.

How to use Maybe base correctly?

I write a code to convert a string to be a list of Intger:
convert::String -> Maybe [Int]
convert (c:cs)
isDigit c = Just(c: (convert cs):[])
otherwise = Nothing
and it shows an error...
test.hs:15:26: error:
parse error on input ‘=’
Perhaps you need a 'let' in a 'do' block?
e.g. 'let x = 5' instead of 'x = 5'
Why is that so...

While there are other compile errors in your code, the reason you're getting the error message about the parse error is because you are not including the pipe character used in guards.
convert (c:cs)
| isDigit c = Just(c: (convert cs):[])
| otherwise = Nothing

There were several errors in your code. You need to try to convert each character, which gives a Maybe Int. Then you loop on the string using mapM inside the Maybe monad :
import Data.Char
convertChar :: Char -> Maybe Int
convertChar c = if isDigit c then Just $ ord c else Nothing
convert :: String -> Maybe [Int]
convert = mapM convertChar

Another way of looking at V. Semeria's answer is to use sequence (from the Traversable type class) with map:
import Data.Char -- for ord
convertChar :: Char -> Maybe Int
convertChar c = if isDigit c then Just $ ord c - 48 else Nothing
convert :: String -> Maybe [Int]
-- Start with String -> [Maybe Int], then convert
-- the [Maybe Int] to Maybe [Int]
convert = sequence . map convertChar
This is a common pattern, so Traversable also provides traverse, which in this case is equivalent to sequence . map.
convert :: Traversable t => t Char -> Maybe (t Int)
convert = traverse converChar
Some examples (most of the instances of Traversable available by default aren't very interesting, but various tree types can have instances).
> convert "123"
Just [1,2,3]
> convert (Just '1')
Just (Just 1)
> convert (True, '2')
Just (True, 2)
> convert (Right '1')
Just (Right 1)

Does Maybe MonadPlus Parsers need to be in certain order?

Im working through the exercises on wikibooks/haskell and there is an exercise in the MonadPlus-chapter that wants you to write this hexChar function. My function works as shown below, but the thing is that when I try to switch the 2 helper parsers (digitParse and alphaParse) around the function ceases to work properly. If I switch them around I can only parse digits and not alphabetic chars anymore.
Why is this so?
char :: Char -> String -> Maybe (Char, String)
char c s = do
let (c':s') = s
if c == c' then Just (c, s') else Nothing
digit :: Int -> String -> Maybe Int
digit i s | i > 9 || i < 0 = Nothing
| otherwise = do
let (c:_) = s
if read [c] == i then Just i else Nothing
hexChar :: String -> Maybe (Char, String)
hexChar s = alphaParse s `mplus` digitParse s -- cannot switch these to parsers around!!
where alphaParse s = msum $ map ($ s) (map char (['a'..'f'] ++ ['A'..'F']))
digitParse s = do let (c':s') = s
x <- msum $ map ($ s) (map digit [0..9])
return (intToDigit x, s')

if read [c] == i then Just i else Nothing
The marked code has a flaw. You're using Int's Read instance, e.g. read :: String -> Int. But if it's not possible to parse [c] as an int (e.g. "a"), read will throw an exception:
> digit 1 "doesnt start with a digit"
*** Exception: Prelude.read: no parse
> -- other example
> (read :: String -> Int) "a"
*** Exception: Prelude.read: no parse
Instead, go the other way:
if [c] == show i then Just i else Nothing
This will always works, since show won't fail (not counting cases where bottom is involved).

Haskell "parse" not terminating for specific type of string

parse' :: Parser a -> String -> [(a,String)]
parse' p inp = p `with` inp
parse :: Parser a -> String -> [a]
parse p inp = [ v | (v,[]) <- parse' p inp ]
mkMany1 :: (Parser a -> Parser [a]) -> Parser a -> Parser [a]
mkMany1 many p = do x <- p
xs <- many p
return (x:xs)
many1L :: Parser a -> Parser [a]
many1L = mkMany1 manyL
manyL :: Parser a -> Parser [a]
manyL p = (many1L p) ||| (success [])
I'm trying to parse a String for a number of substrings that doesn't include the characters '<', '>' or ' '(space) but my parser doesn't seem to terminate. Can someone give me some pointers on what I'm missing?
textValid :: Char -> Bool
textValid c = c /= '<' && c /= '>' && not (isSpace c)
text :: Parser String
text = manyL (sat textValid)
When I try to run the following command, it never terminates.
parse (manyL text) "abc def <"

The problem is that manyL parser can succeed without consuming input (returning an empty list).
And one must not pass a parser that can succeed without consuming input as the argument of manyL, because in that case, you get precisely such an infinite loop as you are in.
After the first text consumed the "abc" prefix of the input, you are left with " def <" a String beginning with a space. So trying text on that, it consumes as many textValid characters as there are at the beginning of the String - namely 0 - and returns them - []. That leaves the same input. Now manyL text tries text another time to see if that succeeds too ...
You should probably define
text = many1L (sat textValid)
so that text doesn't succeed without consuming input, and probably it is a good idea to consume spaces from the beginning of the remaining input after each successful parse, like
text = do
result <- many1L (sat textValid)
skipSpaces
return result
(skipSpaces left to implement).

What is the inverse of this function?

flattern :: [(Char, Int)] -> String
flattern [] = ""
flattern ((w,l):xs) = show l ++ w : flattern xs
What would be the inverse function of this? Is there any way of be able to work this out?

It's not invertible:
There are strings that cannot be reproduced by this function (any string not starting with a digit).
It's not even partially invertible. There are also strings that correspond to multiple inputs: "1111" can be produced by either [('1',1),('1',1)] or [('1',111)].
Are you sure that this was the function to invert, and not something like flattern ((w,l):xs) = replicate l w ++ flattern xs?

If you really want, you could parse the output of the function to try and reconstruct what the arguments must have been.
import Text.ParserCombinators.Parsec
unflat1 :: Parser (Char, Int)
unflat1 = do
c <- anyChar
n <- many1 digit
return (c, read n)
readExpr :: String -> Either String [(Char, Int)]
readExpr input = case parse (many unflat1) "unflat" input of
Left err -> Left ("No match: " ++ show err)
Right val -> Right val
This sorta works, as long as flattern does not have numbers as the first input.
Something like flattern [('a',1), ('2',3)] will be parsed as [(a, 123)].

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Haskell: Elegant solution for integer parsing - string

Related

Monadic Parser - handling string with one character

How to use Maybe base correctly?

Does Maybe MonadPlus Parsers need to be in certain order?

Haskell "parse" not terminating for specific type of string

What is the inverse of this function?

Categories

Resources