Parsec if a match it found then throw error - haskell

I am trying to distinguish between Ints and floats in a parser. I have 2 parsers one for each int and float. However, I am having trouble getting into to fail on a '.'. I looked for negating and look ahead and didn't seem to get and fruits.
I hope I am not duplicating any questions.
I had it working with looking at the next character that is not a '.' but that is an ugly solution.
EDIT: Added more code.
--Int--------------------------------------------------------------------
findInt :: Parser String
findInt = plus <|> minus <|> number
number :: Parser String
number = many1 digit
plus :: Parser String
plus = char '+' *> number
minus :: Parser String
minus = char '-' <:> number
makeInt :: Parser Int
makeInt = prepareResult (findInt <* many (noneOf ".") <* endOfLine)
where readInt = read :: String -> Int
prepareResult = liftA readInt
makeInt2 :: Parser Int
makeInt2 = do
numberFound <- (findInt <* many (noneOf ".") <* endOfLine)
match <- char '.'
return (prepareResult numberFound)
where readInt = read :: String -> Int
prepareResult = readInt
--End Int----------------------------------------------------------------

I think you are best off actually combining the two parsers into one. Try something like this:
import Text.Parsec.String (Parser)
import Control.Applicative ((<|>))
import Text.Parsec.Char (char,digit)
import Text.Parsec.Combinator (many1,optionMaybe)
makeIntOrFloat :: Parser (Either Int Float)
makeIntOrFloat = do
sign <- optionMaybe (char '-' <|> char '+')
n <- many1 digit
m <- optionMaybe (char '.' *> many1 digit)
return $ case (m,sign) of
(Nothing, Just '-') -> Left (negate (read n))
(Nothing, _) -> Left (read n)
(Just m, Just '-') -> Right (negate (read n + read m / 10.0^(length m)))
(Just m, _) -> Right (read n + read m / 10.0^(length m))
ErikR has a correct solution, but the use of try means that parsec has to keep track of the possibility of backtracking (which is a bit inefficient) when in fact that is unnecessary in this case.
Here, the key difference is that we can actually tell right away if we have a float or not - if we don't have a float, the char '.' *> many1 digit parser in optionMaybe will fail immediately (without consuming input), so there is no need to consider backtracking.
At GHCi
ghci> import Text.Parsec.Prim
ghci> parseTest makeIntOrFloat "1234.012"
Right 1234.012
ghci> parseTest makeIntOrFloat "1234"
Left 1234

I would use notFollowedBy - e.g.:
import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Combinator
int :: Parser String
int = many1 digit <* notFollowedBy (char '.')
float :: Parser (String,String)
float = do whole <- many1 digit
fracpart <- try (char '.' *> many digit) <|> (return "")
return (whole, fracpart)
intOrFloat :: Parser (Either String (String,String))
intOrFloat = try (fmap Left int) <|> (fmap Right float)
test1 = parseTest (intOrFloat <* eof) "123"
test2 = parseTest (intOrFloat <* eof) "123.456"
test3 = parseTest (intOrFloat <* eof) "123."

It is typically easiest to use applicative combinators to build your parsers - this makes your parsers easier to reason about and often you do not need monadic and backtracking functions of the parser.
For example, a parser for integers could be written as such:
import Text.Parsec hiding ((<|>), optional)
import Text.Parsec.String
import Numeric.Natural
import Control.Applicative
import Data.Foldable
natural :: Parser Natural
natural = read <$> many1 digit
sign :: Num a => Parser (a -> a)
sign = asum [ id <$ char '+'
, negate <$ char '-'
, pure id
]
integer :: Parser Integer
integer = sign <*> (fromIntegral <$> natural)
A decimal number is an integer optionally followed by a decimal portion (a '.' followed by another integer), which is itself a number proper, so your parser can be written as
decimalPart :: Parser Double
decimalPart = read . ("0."++) <$> (char '.' *> many1 digit)
integerOrDecimal :: Parser (Either Integer Double)
integerOrDecimal = liftA2 cmb integer (optional decimalPart) where
cmb :: Integer -> Maybe Double -> Either Integer Double
cmb x Nothing = Left x
cmb x (Just d) = Right (fromIntegral x + d)
The definition of cmb is obvious - if the is no decimal part, then produce an Integer, and if there is, produce a Double, by adding the integer part to the decimal part.
You can also define a parser for decimals in terms of the above:
decimal :: Parser Double
decimal = either fromIntegral id <$> integerOrDecimal
Note that none of the above parsers directly use monadic functions (i.e. >>=) or backtracking - making them simple and efficient.

Related

How to use parsec to get sub-strings of specific pattern in a string

I'm a newbie to Haskell, and now I'm learning to use parsec. I get stuck in one problem, that is, I want to get all the sub-strings which satisfies some specific pattern in a string. For example, from the following string,
"I want to choose F12 or F 12 from F1(a), F2a, F5-A, F34-5 and so on,
but F alone should not be chosen, that is, choose those which start with F
followed by a digit (before the digit there could be zero or more than one space) and then by any character from ['a'..'z'] ++
['A'..'Z'] ++ ['0'..'9'] ++ ['(',')',"-"]."
the result should be [F12, F12, F1(a), F2a, F5-A, F34-5], where the space between the F and the digit should be deleted.
With the parsec, I have succeeded in getting one sub-string, such as F12, F2a. The code is as follows:
hao :: Parser Char
hao = oneOf "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ()-"
tuhao :: Parser String
tuhao = do { c <- char 'F'
; many space
; c1 <- digit
; cs <- many hao
; return (c:c1:cs)
}
parse tuhao "" str -- can parse the str and get one sub-string.
However, I am stuck at how to parse the example string above and get all the sub-strings of the specific pattern. I have an idea that if F is found, then begin parsing, else skip parsing or if parsing fails then skip parsing. But I don't know how to implement the plan. I have another idea that uses State to record the remaining string that is not parsed, and use recursion, but still fail to carry it out.
So I appreciate any tip! ^_^
F12, F 12, F1(a), F2a, F5-A, F34-5
This is an incomplete description, so I'll make some guesses.
I would start by defining a type that can contain the logical parts of these expressions. E.g.
newtype F = F (Int, Maybe String) deriving Show
That is, "F" followed by a number and an optional part that is either letters, parenthesised letters, or a dash followed by letters/digits. Since the number after "F" can have multiple digits, I assume that the optional letters/digits may be multiple, too.
Since the examples are limited, I assume that the following aren't valid: F1a(b), F1(a)b, F1a-5, F1(a)-A, F1a(a)-5, F1a1, F1-(a), etc. and that the following are valid: F1A, F1abc, F1(abc), F1-abc, F1-a1b2. This is probably not true. [1]
I would then proceed to write parsers for each of these sub-parts and compose them:
module Main where
import Text.Parsec
import Data.Maybe (catMaybes)
symbol :: String -> Parser String
symbol s = string s <* spaces
parens :: Parser a -> Parser a
parens = between (string "(") (string ")")
digits :: Parser Int
digits = read <$> many1 digit
parseF :: Parser F
parseF = curry F <$> firstPart <*> secondPart
where
firstPart :: Parser Int
firstPart = symbol "F" >> digits
secondPart :: Parser (Maybe String)
secondPart = optionMaybe $ choice
[ many1 letter
, parens (many1 letter)
, string "-" >> many1 alphaNum
]
(As Jon Purdy writes in a comment,) using this parser on a string to get multiple matches,
extract :: Parser a -> Parser [a]
extract p = do (:) <$> try p <*> extract p
<|> do anyChar >> extract p
<|> do eof >> return []
readFs :: String -> Either ParseError [F]
readFs s = parse (extract parseF) "" s
main :: IO ()
main = print (readFs "F12, F 12, F1(a), F2a, F5-A, F34-5")
This prints:
Right [F (12,Nothing),F (12,Nothing),F (1,Just "a"),F (2,Just "a"),F (5,Just "A"),F (34,Just "5")]
Takeaways:
You can parse optional whitespace using token parsing (symbol).
You can parse optional parts with option, optionMaybe or optional.
You can alternate between combinators using a <|> b <|> c or choice [a, b, c].
When alternating between choices, make sure they don't have overlapping FIRST sets. Otherwise you need to try; this is nasty but sometimes unavoidable. (In this case, FIRST sets for the three choices are letter, string "(" and string "-", i.e. not overlapping.)
[1]: For the sake of restriction, I kept to the assumptions above, but I felt that I could also have assumed that F1a-B, F1(a)-5 and F1(a)-5A are valid, in which case I might change the model to:
newtype F = F (Int, Maybe String, Maybe String)
We can get sub-strings of specific pattern in a string with the
findAll
combinator from
replace-megaparsec.
Notice that this tuhao parser doesn't actually return anything. The findAll combinator just checks for success of the parser to find sub-strings which match the pattern.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Maybe
import Data.Either
let tuhao :: Parsec Void String ()
tuhao = do
void $ single 'F'
void $ space
void $ digitChar
void $ many $ oneOf "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ()-"
input = "I want to choose F12 or F 12 from F1(a), F2a, F5-A, F34-5 and so on, but F alone should not be chosen, that is, choose those which start with F followed by a digit (before the digit there could be zero or more than one space) and then by any character from ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9'] ++ ['(',')',\"-\"]."
rights $ fromJust $ parseMaybe (findAll tuhao) input
["F12","F 12","F1(a)","F2a","F5-A","F34-5"]

Parser for JSON String

I'm trying to write a parser for a JSON String.
A valid example, per my parser, would be: "\"foobar\"" or "\"foo\"bar\"".
Here's what I attempted, but it does not terminate:
parseEscapedQuotes :: Parser String
parseEscapedQuotes = Parser f
where
f ('"':xs) = Just ("\"", xs)
f _ = Nothing
parseStringJValue :: Parser JValue
parseStringJValue = (\x -> S (concat x)) <$>
((char '"') *>
(zeroOrMore (alt parseEscapedQuotes (oneOrMore (notChar '"'))))
<* (char '"'))
My reasoning is that, I can have a repetition of either escaped quotes "\"" or characters not equal to ".
But it's not working as I expected:
ghci> runParser parseStringJValue "\"foobar\""
Nothing
I don't know what parser combinator library you are using, but here is a working example using Parsec. I'm using monadic style to make it clearer what's going on, but it is easily translated to applicative style.
import Text.Parsec
import Text.Parsec.String
jchar :: Parser Char
jchar = escaped <|> anyChar
escaped :: Parser Char
escaped = do
char '\\'
c <- oneOf ['"', '\\', 'r', 't' ] -- etc.
return $ case c of
'r' -> '\r'
't' -> '\t'
_ -> c
jstringLiteral :: Parser String
jstringLiteral = do
char '"'
cs <- manyTill jchar (char '"')
return cs
test1 = parse jstringLiteral "" "\"This is a test\""
test2 = parse jstringLiteral "" "\"This is an embedded quote: \\\" after quote\""
test3 = parse jstringLiteral "" "\"Embedded return: \\r\""
Note the extra level of backslashes needed to represent parser input as Haskell string literals. Reading the input from a file would make creating the parser input more convenient.
The definition of the manyTill combinator is:
manyTill p end = scan
where
scan = do{ end; return [] }
<|>
do{ x <- p; xs <- scan; return (x:xs) }
and this might help you figure out why your definitions aren't working.

datatype conversion without using buildExpressionParser

I am stuck at a point in converting a expression entered by the user to my own datatype
I did it using biuldExpressionParser , but using simple parser and recursion I did as follows
openBrace = char '('
closeBrace :: GenParser Char st Char
closeBrace = char ')'
bracketExpr = do
spaces >> openBrace
expr <- expressionParser
spaces >> closeBrace
return expr
bracketExpr will return the entered expression in my own datatype
to convert it into my datatype I did for negation,if expression is a number or a variable as follows:
expressionParser = negate1
<|> number
<|> variable
--<|> addition
<?> "simple expression"
negate1 :: Parser Expr
negate1 = do{ char '-'
;ds <- number
;return (ExprNeg (ds) )
}
<?> "negate"
variable :: Parser Expr
variable = do{ ds<- many1 (letter <|> digit)
; return (ExprVar ds)}
<?> "variable"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (ExprNum (read ds))}
<?> "number"
To do the same for addtion I tried by seperating the expression using sepBy but I am encountering several issues.
If the extered expreesion is 1+2
Then I should getExprAdd (ExprNum 1) (ExprNum 2)
I am unable to proceed further from here .Help would be great.
Thank you.
If you want to be writing a parser with parser combinators you need to think in terms of high-level rules first. Here's a skeleton parser in Parsec; it does not 100% meet your needs because all of the operators are same-precedence and right-associative, whereas you probably want different precedences and left-associativity. Still, this is the basic way to write a parser:
import Text.Parsec
import Text.Parsec.Char
import Data.Char (isDigit)
-- basic data type
data Expr = Op Char Expr Expr | N Integer deriving (Show)
type Parser x = Parsec String () x
-- reverse-sequenced >>, used to implement `parenthesized` and `whitespaced`
(<<) :: Monad m => m x -> m y -> m x
mx << my = mx >>= \x -> my >> return x
infixl 1 <<
parenthesized :: Parser e -> Parser e
parenthesized e = char '(' >> e << char ')'
whitespaced :: Parser e -> Parser e
whitespaced e = spaces >> e << spaces
number :: Parser Expr
number = do
c <- oneOf "123456789" -- leading 0's can be reserved for octal/hexadecimal
cs <- many digit
return (N (read (c:cs)))
operator :: Parser Expr
operator = do
e1 <- expr_no_op
o <- whitespaced (oneOf "+*/-")
e2 <- expression
return (Op o e1 e2)
expr_no_op :: Parser Expr
expr_no_op = whitespaced (try number <|> parenthesized expression)
expression :: Parser Expr
expression = whitespaced (try operator <|> try number <|> parenthesized expression)
Notice that you define tokens (above, just 'number') and then combine them with a "try this <|> try that <|> otherwise..." syntax. Notice also that you have to stop operator from taking an expression as its first argument otherwise you'll get a operator -> expression -> operator loop in the parsing. This is called "left factoring."

Parsec lookahead to handle ints

I'm working on a Parsec parser to handle a somewhat complex data file format (and I have no control over this format).
I've made a lot of progress, but am currently stuck with the following.
I need to be able to parse a line somewhat like this:
4 0.123 1.452 0.667 * 3.460 149 - -
Semantically, the 4 is a nodeNum, the Floats and the * are negative log probabilities (so, * represents the negative log of probability zero). The 149 and the minus signs are really junk, which I can discard, but I need to at least make sure they don't break the parser.
Here's what I have so far:
This handles the "junk" I mentioned. It could probably be simpler, but it works by itself.
emAnnotationSet = (,,) <$> p_int <*>
(reqSpaces *> char '-') <*>
(reqSpaces *> char '-')
the nodeNum at the beginning of the line is handled by another parser that works and I need not get into.
The problem is in trying to pick out all the p_logProbs from the line, without consuming the digits at the beginning of the emAnnotationSet.
the parser for p_logProb looks like this:
p_logProb = liftA mkScore (lp <?> "logProb")
where lp = try dub <|> string "*"
dub = (++) <$> ((++) <$> many1 digit <*> string ".") <*> many1 digit
And finally, I try to separate the logProb entries from the trailing emAnnotationSet (which starts with an integer) as follows:
hmmMatchEmissions = optSpaces *> (V.fromList <$> sepBy p_logProb reqSpaces)
<* optSpaces <* emAnnotationSet <* eol
<?> "matchEmissions"
So, p_logProb will only succeed on a float that begins with digits, includes a decimal point, and then has further digits (this restriction is respected by the file format).
I'd hoped that the try in the p_logProb definition would avoid consuming the leading digits if it didn't parse the decimal and the rest, but this doesn't seem to work; Parsec still complains that it sees an unexpected space after the digits of that integer in the emAnnotationSet:
Left "hmmNode" (line 1, column 196):
unexpected " "
expecting logProb
column 196 corresponds to the space after the integer preceding the minus signs, so it's clear to me that the problem is that the p_logProb parser is consuming the integer. How can I fix this so the p_logProb parser uses lookahead correctly, thus leaving that input for the emAnnotationSet parser?
The integer which terminates the probabilities cannot be mistaken for a probability since it doesn't contain a decimal point. The lexeme combinator converts a parser into one that skips trailing spaces.
import Text.Parsec
import Text.Parsec.String
import Data.Char
import Control.Applicative ( (<$>), (<*>), (<$), (<*), (*>) )
fractional :: Fractional a => Parser a
fractional = try $ do
n <- fromIntegral <$> decimal
char '.'
f <- foldr (\d f -> (f + fromIntegral (digitToInt d))/10.0) 0.0 <$> many1 digit
return $ n + f
decimal :: Parser Int
decimal = foldl (\n d -> 10 * n + digitToInt d) 0 <$> many1 digit
lexeme :: Parser a -> Parser a
lexeme p = p <* skipMany (char ' ')
data Row = Row Int [Maybe Double]
deriving ( Show )
probability :: Fractional a => Parser (Maybe a)
probability = (Just <$> fractional) <|> (Nothing <$ char '*')
junk = lexeme decimal <* count 2 (lexeme $ char '-')
row :: Parser Row
row = Row <$> lexeme decimal <*> many1 (lexeme probability) <* junk
rows :: Parser [Row]
rows = spaces *> sepEndBy row (lexeme newline) <* eof
Usage:
*Main> parseTest rows "4 0.123 1.234 2.345 149 - -\n5 0.123 * 2.345 149 - -"
[Row 4 [Just 0.123,Just 1.234,Just 2.345],Row 5 [Just 0.123,Nothing,Just 2.345]]
I'm not exactly sure of your problem. However, to parse the line given based on your description, it would be much easier to use existing lexers define in Text.Parsec.Token1, and join them together.
The below code parses the line into a Line data type, you can process it further from there if necessary. Instead of attempting to filter out the - and integers before parsing, it uses a parseEntry parser that returns a Just Double if it is a Float value, Just 0 for *, and Nothing for integers and dashes. This is then very simply filtered using catMaybes.
Here is the code:
module Test where
import Text.Parsec
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (haskellDef)
import Control.Applicative ((<$>))
import Data.Maybe (catMaybes)
lexer = P.makeTokenParser haskellDef
parseFloat = P.float lexer
parseInteger = P.natural lexer
whiteSpace = P.whiteSpace lexer
parseEntry = try (Just <$> parseFloat)
<|> try (const (Just 0) <$> (char '*' >> whiteSpace))
<|> try (const Nothing <$> (char '-' >> whiteSpace))
<|> (const Nothing <$> parseInteger)
data Line = Line {
lineNodeNum :: Integer
, negativeLogProbabilities :: [Double]
} deriving (Show)
parseLine = do
nodeNum <- parseInteger
whiteSpace
probabilities <- catMaybes <$> many1 parseEntry
return $ Line { lineNodeNum = nodeNum, negativeLogProbabilities = probabilities }
Example usage:
*Test> parseTest parseLine "4 0.123 1.452 0.667 * 3.460 149 - -"
Line {lineNodeNum = 4, negativeLogProbabilities = [0.123,1.452,0.667,0.0,3.46]}
The only issue that may (or may not) be a problem is it will parse *- as two different tokens, rather than fail at parsing. Eg
*Test> parseTest parseLine "4 0.123 1.452 0.667 * 3.460 149 - -*"
Line {lineNodeNum = 4, negativeLogProbabilities = [0.123,1.452,0.667,0.0,3.46,0.0]}
Note the extra 0.0 at the end of the log probabilities.

Haskell parsec parsing to maybe

Just a simple question that I cannot solve.
I want to parse a string as either a String or a Maybe Double, where an empty string or an "n/a" is parsed as a Nothing. For example something like:
data Value = S String | N (Maybe Double)
value::CharParser () Value
value = val <* spaces
where val = N <$> v_number
<|> S <$> v_string
<|> N <$> v_nothing
I am having trouble with the v_nothing (and also leading and training white space).
Thanks.
EDIT:
v_number :: CharParser () (Maybe Double)
v_number = do s <- getInput
case readSigned readFloat s of
[(n, s')] -> Just n <$ setInput s'
_ -> empty
v_string :: CharParser () String
v_string = (many1 jchar)
where jchar = char '\\' *> (p_escape <|> p_unicode)
<|> satisfy (`notElem` "\"\\")
I tried all sort sorts of things for v_nothing to no avail.
Maybe something like this?
value = do skipMany space
choice $ map try [
do string "n/a" <|> (eof >> return [])
return $ N Nothing,
do d <- many digit
return $ N $ Just (read d)
-- do ...
]

Resources