End of file unexpected in haskell - haskell

I think I have researched very hard about my problem so here I am.
I have a "end of file unexpected" error at line
6 colunm 33.
I have already tried many solutions to resolve my problem.
Here is my code and the file I am trying to parse.
Here is the text I am trying to parse :
ifc.txt :
#9512= IFCBUILDINGSTOREY('3y21AUC9X4yAqzLGUny16E',#16,'Story',$,$,#9509,$,$,.ELEMENT.,6200.);
#9509= IFCLOCALPLACEMENT(#115,#9506);
#9506= IFCAXIS2PLACEMENT3D(#9502,#9498,#9494);
#9502= IFCCARTESIANPOINT((0.,0.,6200.));
#9498= IFCDIRECTION((0.,0.,1.));
#9494= IFCDIRECTION((1.,0.,0.));
Here is the code :
code.hs :
import Text.ParserCombinators.Parsec
main = do
f <- readFile "ifc.txt"
let m = (parse ifc "" f)
print m
ifc :: Parser IfcModel
ifc = many ifcentry
ifcentry = do
string "#"
i <- idt
string "= "
name <- idt
string "("
prop <- idt
string ")"
string ";"
string "\n"
return (i,name,prop)
idt = many (letter <|> digit <|> char ','
<|> char '$' <|> char ')' <|> char '\''
<|> char '=' <|> char ';' <|> char '\n'
<|> char ' ' <|> char '(' <|> char '#'
<|> char '.' <|> char '\r')
Thanks for your help, i should have checked a bit earlier my anwser because i worked on my own and i found asolution i will post it when i can (8hours left for a newbie like me who has less than 10 in reputation).
Thanks again.

Solution: use sepBy instead of including the newline in ifcentry
Your ifcentry expects a newline at the end, and your input doesn't have one, which is why the EOF was unexpected.
Drop the string "\n" from ifcentry and instead define
ifc :: Parser IfcModel
ifc = ifcentry `sepBy` (char '\n')
Also, your idt parser is needlessly long. It would be clearer as
idt = many (letter <|> digit <|> oneOf ".,;' =#$()\n\r")
Clearer ifcentry
And while I'm at it, I'd write
ifcentry = do
char '#'
i <- idt
string "= "
name <- idt
prop <- parens idt
char ';'
return (i,name,prop)
Because parens (which parses an open bracket, your idt content, then a close bracket) tidies it up and makes it clearer.
Less verbose main
I'd also write
main = fmap (parse ifc "") (readFile "ifc.txt") >>= print
certainly there's no need for
let m = (parse ifc "" f)
print m
because you may as well do
print (parse ifc "" f)

In addtion to #enough rep to comment's answer
I would go much further and declare something in the line of
data IFCType = IFCBuildingStorey ....
| IFCLocalPlacement ....
| IFCAxis2Placement3D ....
| IFCCartesianpoint Double Double Double
| IFCDirection ....
deriving Show
and
type ID = Integer
type IFCElement = (ID,IFCType)
where i will show the CartesianPoint as an example
ifctype :: Parser IFCType
ifctype = do string "IFC"
buildingStorey
<|> localPlacement
<|> axis2Placement3D
<|> cartesianpoint
<|> direction
buildingStorey :: Parser IFCType
buildingStorey = do string "BUILDINGSTOREY"
return IFCBuildingStorey
localPlacement :: Parser IFCType
localPlacement = do string "LOCALPLACEMENT"
return IFCLocalPlacement
axis2Placement3D :: Parser IFCType
axis2Placement3D = do string "AXIS2PLACEMENT3D"
return IFCAxis2Placement3D
cartesianpoint :: Parser IFCType
cartesianpoint = do string "CARTESIANPOINT"
char '('
char '('
x <- double
char ','
y <- double
char ','
z <- double
char ')'
char ')'
return $ IFCCartesianpoint x y z
double :: Parser Double
double = do d <- many1 (digit <|> char '.')
return $ read d
direction :: Parser IFCType
direction = do string "DIRECTION"
return IFCDirection
this has the additional advantage that you have typed models.

Thanks for your help everyone, i should have checked a bit earlier my anwser because i worked on my own and finally found a solution :
import Text.ParserCombinators.Parsec
main = do
f <- readFile "ifc.txt"
let m = (parse ifc "" f)
print m
type IfcIdt = String
type IfcName = String
type IfcProp = [String]
type IfcModel = [(IfcIdt,IfcName,IfcProp)]
ifc :: Parser IfcModel
ifc = many ifcentry
ifcentry = do
string "#"
i <- idtnumber
string "= "
name <- idtname
opening
prop <- ifcprop
closing
eol
return (i,name,prop)
idtnumber = many digit
idtname = many (letter <|> digit)
ifcprop = sepBy prop (char ',')
prop = many (noneOf "=,();\n")
eol = string ";\n"
opening = try (string "((")
<|> string "("
closing = try (string "))")
<|> string ")"

Related

Parsec3 Text parser for quoted string, where everything is allowed in between quotes

I have actually asked this question before (here) but it turns out that the solution provided did not handle all test cases. Also, I need 'Text' parser rather than 'String', so I need parsec3.
Ok, the parser should allow for EVERY type of char inbetween quotes, even quotes. The end of the quoted text is marked by a ' character, followed by |, a space or end of input.
So,
'aa''''|
should return a string
aa'''
This is what I have:
import Text.Parsec
import Text.Parsec.Text
quotedLabel :: Parser Text
quotedLabel = do -- reads the first quote.
spaces
string "'"
lab <- liftM pack $ endBy1 anyChar endOfQuote
return lab
endOfQuote = do
string "'"
try(eof) <|> try( oneOf "| ")
Now, the problem here is of course that eof has a different type than oneOf "| ", so compilation falls.
How do I fix this? Is there a better way to achieve what I am trying to do?
Whitespace
First a comment on handling white space...
Generally the practice is to write your parsers so that they
consume the whitespace following a token
or syntactic unit. It's common to define combinator like:
lexeme p = p <* spaces
to easily convert a parser p to one that discards the whitespace
following whatever p parses. E.g., if you have
number = many1 digit
simply use lexeme number whenever you want to eat up the
whitespace following the number.
For more on this approach to handling whitespace and other advice
on parsing languages, see this Megaparsec tutorial.
Label expressions
Based on your previous SO question it appears you want
to parse expressions of the form:
label1 | label2 | ... | labeln
where each label may be a simple label or a quoted label.
The idiomatic way to parse this pattern is to use sepBy like this:
labels :: Parser String
labels = sepBy1 (try quotedLabel <|> simpleLabel) (char '|')
We define both simpleLabel and quotedLabel in terms of
what characters may occur in them. For simpleLabel a valid
character is a non-| and non-space:
simpleLabel :: Parser String
simpleLabel = many (noneOf "| ")
A quotedLabel is a single quote followed by a run
of valid quotedLabel-characters followed by an ending
single quote:
sq = char '\''
quotedLabel :: Parser String
quotedLabel = do
char sq
chs <- many validChar
char sq
return chs
A validChar is either a non-single quote or a single
quote not followed by eof or a vertical bar:
validChar = noneOf [sq] <|> try validQuote
validQuote = do
char sq
notFollowedBy eof
notFollowedBy (char '|')
return sq
The first notFollowedBy will fail if the single quote appears just
before the end of input. The second notFollowedBy will fail if
next character is a vertical bar. Therefore the sequence of the two
will succeed only if there is a non-vertical bar character following
the single quote. In this case the single quote should be interpreted
as part of the string and not the terminating single quote.
Unfortunately this doesn't quite work because the
current implementation of notFollowedBy
will always succeed with a parser which does not consume any
input -- i.e. like eof. (See this issue for more details.)
To work around this problem we can use this alternate
implementation:
notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
do {a <- try p; return (unexpected (show a));}
<|> return (return ())
Here is the complete solution with some tests. By adding a few lexeme
calls you can make this parser eat up any white space where you decide
it is not significant.
import Text.Parsec hiding (labels)
import Text.Parsec.String
import Control.Monad
notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
do {a <- try p; return (unexpected (show a));}
<|> return (return ())
sq = '\''
validChar = do
noneOf "'" <|> try validQuote
validQuote = do
char sq
notFollowedBy' eof
notFollowedBy (char '|')
return sq
quotedLabel :: Parser String
quotedLabel = do
char sq
str <- many validChar
char sq
return str
plainLabel :: Parser String
plainLabel = many (noneOf "| ")
labels :: Parser [String]
labels = sepBy1 (try quotedLabel <|> try plainLabel) (char '|')
test input expected = do
case parse (labels <* eof) "" input of
Left e -> putStrLn $ "error: " ++ show e
Right v -> if v == expected
then putStrLn $ "OK - got: " ++ show v
else putStrLn $ "NOT OK - got: " ++ show v ++ " expected: " ++ show expected
test1 = test "a|b|c" ["a","b","c"]
test2 = test "a|'b b'|c" ["a", "b b", "c"]
test3 = test "'abc''|def" ["abc'", "def" ]
test4 = test "'abc'" ["abc"]
test5 = test "x|'abc'" ["x","abc"]
To change the result of any functor computation you can just use:
fmap (const x) functor_comp
e.g.:
getLine :: IO String
fmap (const ()) getLine :: IO ()
eof :: Parser ()
oneOf "| " :: Parser Char
fmap (const ()) (oneOf "| ") :: Parser ()
Another option is to use operators from Control.Applicative:
getLine *> return 3 :: IO Integer
This performs getLine, discards the result and returns 3.
In your case, you might use:
try(eof) <|> try( oneOf "| " *> return ())

Parser for JSON String

I'm trying to write a parser for a JSON String.
A valid example, per my parser, would be: "\"foobar\"" or "\"foo\"bar\"".
Here's what I attempted, but it does not terminate:
parseEscapedQuotes :: Parser String
parseEscapedQuotes = Parser f
where
f ('"':xs) = Just ("\"", xs)
f _ = Nothing
parseStringJValue :: Parser JValue
parseStringJValue = (\x -> S (concat x)) <$>
((char '"') *>
(zeroOrMore (alt parseEscapedQuotes (oneOrMore (notChar '"'))))
<* (char '"'))
My reasoning is that, I can have a repetition of either escaped quotes "\"" or characters not equal to ".
But it's not working as I expected:
ghci> runParser parseStringJValue "\"foobar\""
Nothing
I don't know what parser combinator library you are using, but here is a working example using Parsec. I'm using monadic style to make it clearer what's going on, but it is easily translated to applicative style.
import Text.Parsec
import Text.Parsec.String
jchar :: Parser Char
jchar = escaped <|> anyChar
escaped :: Parser Char
escaped = do
char '\\'
c <- oneOf ['"', '\\', 'r', 't' ] -- etc.
return $ case c of
'r' -> '\r'
't' -> '\t'
_ -> c
jstringLiteral :: Parser String
jstringLiteral = do
char '"'
cs <- manyTill jchar (char '"')
return cs
test1 = parse jstringLiteral "" "\"This is a test\""
test2 = parse jstringLiteral "" "\"This is an embedded quote: \\\" after quote\""
test3 = parse jstringLiteral "" "\"Embedded return: \\r\""
Note the extra level of backslashes needed to represent parser input as Haskell string literals. Reading the input from a file would make creating the parser input more convenient.
The definition of the manyTill combinator is:
manyTill p end = scan
where
scan = do{ end; return [] }
<|>
do{ x <- p; xs <- scan; return (x:xs) }
and this might help you figure out why your definitions aren't working.

Parse a list whose separator may also occur at the end

I am trying to parse some text, but I can't understand how to parse a list of symbols separated by some separator, which may or may not occur also at the end of the list.
Example (numbers separated by spaces):
set A = 1 2 3 4 5;
set B =6 7 8 9;
set C = 10 11 12 ;
If I use sepBy, after the last space I got an error because it expects another digit, even if I try to read also many whitespace after the list. If I use endBy, I got an error when the space is missing.
import Text.ParserCombinators.Parsec
main :: IO ()
main = do
let input = "set A = 1 2 3 4 5;\n" ++
"set B =6 7 8 9;\n" ++
"set C = 10 11 12 ;\n"
case parse parseInput "(unknown)" input of
Left msg ->
print msg
Right rss ->
mapM_ (\(n, vs) -> putStrLn (n ++ " = " ++ show vs)) rss
whitespace :: GenParser Char st Char
whitespace = oneOf " \t"
parseInput :: GenParser Char st [(String, [Int])]
parseInput = parseRow `endBy` newline
parseRow :: GenParser Char st (String, [Int])
parseRow = do
string "set"
many1 whitespace
name <- many1 alphaNum
many whitespace
string "="
many whitespace
values <- many1 digit `sepBy` many1 whitespace
many whitespace
string ";"
return (name, map read values)
The combinator I think you want is sepEndBy. Using it gives you
-- I'm using the type synonym
-- type Parser = GenParser Char ()
-- from Text.ParseCombinator.Parsec.Prim
parseRow :: Parser (String, [Int])
parseRow = do
string "set" >> many1 whitespace
name <- many1 alphaNum
spaces >> char '=' >> spaces
values <- many1 digit `sepEndBy` many1 whitespace
char ';'
return (name, map read values)
where spaces = many whitespace

Haskell parsec parsing to maybe

Just a simple question that I cannot solve.
I want to parse a string as either a String or a Maybe Double, where an empty string or an "n/a" is parsed as a Nothing. For example something like:
data Value = S String | N (Maybe Double)
value::CharParser () Value
value = val <* spaces
where val = N <$> v_number
<|> S <$> v_string
<|> N <$> v_nothing
I am having trouble with the v_nothing (and also leading and training white space).
Thanks.
EDIT:
v_number :: CharParser () (Maybe Double)
v_number = do s <- getInput
case readSigned readFloat s of
[(n, s')] -> Just n <$ setInput s'
_ -> empty
v_string :: CharParser () String
v_string = (many1 jchar)
where jchar = char '\\' *> (p_escape <|> p_unicode)
<|> satisfy (`notElem` "\"\\")
I tried all sort sorts of things for v_nothing to no avail.
Maybe something like this?
value = do skipMany space
choice $ map try [
do string "n/a" <|> (eof >> return [])
return $ N Nothing,
do d <- many digit
return $ N $ Just (read d)
-- do ...
]

Parsing FB2(XML) in Haskell

Started to learn Haskell, I decided to get acquainted with Parsec, but there were problems. I'm trying to implement the parsing of the books in the format of FB2. On conventional tags ( text ) is good, but when the tag within a tag - does not work.
import Text.ParserCombinators.Parsec
data FB2Doc = Node String FB2Doc
| InnText String
deriving (Eq,Show)
parseFB2 :: GenParser Char st [FB2Doc]
parseFB2 = many test
test :: GenParser Char st FB2Doc
test = do name <- nodeStart
value <- getvalue
nodeEnd
return $ Node name value
nodeStart = do char '<'
name <- many (letter <|> digit <|> oneOf "-_")
char '>'
return name
nodeEnd = do string "</"
many (letter <|> digit)
char '>'
spaces
gettext = do x <- many (letter <|> digit <|> oneOf "-_")
return $ InnText x
getvalue = do (nodeStart >> test) <|> gettext <|> return (Node "" (InnText ""))
main = do
print $ parse parseFB2 "" "<h1><a2>ge</a2></h1> <genre>history_russia</genre>"
I think you want this:
getvalue = try test <|> gettext
The try is needed for empty nodes: "<bla></bla>". test will consume the '<' of </bla>, and the try allows for backtracking.

Resources