Translate Haskell parsec to FParsec - haskell

how to translate this Haskell code:
import Text.ParserCombinators.Parsec((<|>), unexpected, lookAhead, noneOf, char)
import Control.Monad(when)
data BracketElement = BEChar Char | BEChars String | BEColl String | BEEquiv String | BEClass String
p_set_elem_char = do
c <- noneOf "]"
when (c == '-') $ do
atEnd <- (lookAhead (char ']') >> return True) <|> (return False)
when (not atEnd) (unexpected "A dash is in the wrong place in a bracket")
return (BEChar c)
to FParsec ? Preferable way is without monadic syntax to provide good performance.
Thanks in advance, Alexander.
Sorry for little misleading. I slightly corrected problem to make Haskell code compilable:
import Text.ParserCombinators.Parsec((<|>), (<?>), unexpected, lookAhead, noneOf, char)
import Control.Monad(when)
import Data.Functor.Identity
import qualified Text.Parsec.Prim as PR
-- | BracketElement is internal to this module
data BracketElement = BEChar Char | BEChars String | BEColl String | BEEquiv String | BEClass String
deriving Show
p_set_elem_char :: PR.ParsecT [Char] u Identity BracketElement
p_set_elem_char = do
c <- noneOf "]"
when (c == '-') $ do
atEnd <- (lookAhead (char ']') >> return True) <|> (return False)
when (not atEnd) (unexpected "A dash is in the wrong place in a bracket")
return (BEChar c)
Now it is possible to reproduce *p_set_elem_char* computation.
I sincerely thank all of which who helped me.
I made my own approximation, but unfortunately not so functional as it could be:
type BracketElement = BEChar of char
| BEChars of string
| BEColl of string
| BEEquiv of string
| BEClass of string
let p_set_elem_char : Parser<BracketElement, _> =
fun stream ->
let stateTag = stream.StateTag
let reply = (noneOf "]") stream
let chr = reply.Result
let mutable reply2 = Reply(BEChar chr)
if reply.Status = Error && stateTag = stream.StateTag then
reply2.Status <- Error
reply2.Error <- reply.Error
else if chr = '-' && stream.Peek() <> ']' then
reply2.Status <- Error
reply2.Error <- messageError ("A dash is in the wrong place in a bracket")
reply2

Using the BracketElement type in toyvo's answer, you could do something like
let pBEChar : Parser<_, unit> =
let c =
pchar '-' .>> followedByL (pchar ']') "A dash is in the wrong place in a bracket"
<|> noneOf "-]"
c |>> BEChar

I don't know much FParsec, but here is a naive attempt, corrected a bit for performance based on the comments:
type BracketElement =
| BEChar of char
| BEChars of string
| BEColl of string
| BEEquiv of string
| BEClass of string
let parseBEChar : Parser<BracketElement,unit> =
let okChars = noneOf "]"
let endTest =
(lookAhead (skipChar ']') >>. parse.Return(true))
<|> parse.Return(false)
let failure = fail "A dash is in the wrong place in a bracket"
parse {
let! c = okChars
if c = '-' then
let! atEnd = endTest
if not atEnd then
return! failure
else
return BEChar c
else
return BEChar c
}

Similar to what Daniel proposed, you could write that parser as
let pSetElementChar : Parser<_,unit> =
satisfy (function '-' | ']' -> false | _ -> true)
<|> (pchar '-' .>> followedByString "]")
|>> BEChar
If you want to add your custom message to the error, you could use followedByL like in Daniel's answer or you could add the message using the fail primitive
let pSetElementChar2 : Parser<_,unit> =
satisfy (function '-' | ']' -> false | _ -> true)
<|> (pchar '-' .>> (followedByString "]"
<|> fail "A dash is in the wrong place in a bracket"))
|>> BEChar
A low-level implementation can be as simple as
let pSetElementChar3 : Parser<_,unit> =
fun stream ->
let c = stream.ReadCharOrNewline()
if c <> EOS then
if c <> '-' || stream.Peek() = ']' then Reply(BEChar c)
else Reply(Error, messageError "A dash is in the wrong place in a bracket")
else
Reply(Error, unexpected "end of input")

Related

How to delete all char of a string in OCaml

Is there an easy way to delete all occurrences of a char in an OCaml string ?
I thought I could use this clean function :
let clean =
function
| ' ' | '[' | ']' | '\n' | '>' -> ''
| x -> x
in
But you can't use '' in OCaml.
So I came with this solution :
let delete =
function
| ' ' | '[' | ']' | '\n' | '>' -> true
| _ -> false
in
let char_list, size =
String.fold_left
(fun (acc, count) c ->
if delete c then acc, count
else c::acc, count+1) ([],0) path_string
in
let char_list = ref ## List.rev char_list in
let aux() =
match !char_list with
| [] -> failwith "unexpected"
| x :: xs -> char_list := xs; x
in
let cleaned_string = String.init size (fun _ -> aux()) in
cleaned_string
But it's big - with python it's just s.replace('>','') to suppress one - and only work with OCaml 4.13 which is not supported for my installation.
If you want to build a string dynamically, you should use a buffer
let remove_chars erase s =
let b = Buffer.create 10 in
String.iter (fun c -> if not (erase c) then Buffer.add_char b c);
Buffer.contents b
(Building a list of characters is extremely inefficient.)
The closest equivalent to python might be Str.global_replace:
Str.(global_replace (regexp ">") "" s)
or to do multiple characters at once you could do something like:
Str.(global_replace (regexp "[youChar1yourChar2yourChar3]") "" s)
So in your case :
Str.(global_replace (regexp "[][> \n]") "" s)

Haskell: Replace a subString in a String without Data.List package

I'm very new in Haskell and I want to know how I can replace a predetermined word in a String by another word. This is my code so far, I know I can't do this for now:
treat :: String -> String
treat text = text
main::IO()
main = do argv <- getArgs
texte <- readFile "intputText"
print (separation text)
print ( treat text )
separation :: String -> [String]
separation [] = [""]
separation (c:cs) | c == "\Graph" = "Graphic : " : rest
| c == '}' = "" : rest
| c == '{' = "" : rest
| otherwise = (c : head rest) : tail rest
where rest = separation cs
So basically I know I can't put a String in the first c == "\Graph" so I want to know
how I can basically replace every word "\Graph" in my String text by "Graphic".
I want to be able to do that without importing any package.
If anyone can help me out I'd really appreciate it.
Thank you very much!
replace :: String -> String -> String-> String
replace [] token repl = []
replace str#(s:ss) token#(t:tx) repl
-- check if first char of string equal to first char of token
| s == t = case validateToken token str of
Just list -> repl ++ replace list token repl
Nothing -> s : replace ss token repl
-- if not equal then continue recursion step
| otherwise = s: replace ss token repl
where
-- validate if token matches the following chars of the string
-- returns Nothing if token is not matched
-- returns the remaining string after the token if token is matched
validateToken:: String -> String -> Maybe String
validateToken (a:as) [] = Nothing
validateToken [] list = Just list
validateToken (a:as) (x:xs)
| a == x = validateToken as xs
| otherwise = Nothing
example = replace "yourString" "token" "new"

Parser for JSON String

I'm trying to write a parser for a JSON String.
A valid example, per my parser, would be: "\"foobar\"" or "\"foo\"bar\"".
Here's what I attempted, but it does not terminate:
parseEscapedQuotes :: Parser String
parseEscapedQuotes = Parser f
where
f ('"':xs) = Just ("\"", xs)
f _ = Nothing
parseStringJValue :: Parser JValue
parseStringJValue = (\x -> S (concat x)) <$>
((char '"') *>
(zeroOrMore (alt parseEscapedQuotes (oneOrMore (notChar '"'))))
<* (char '"'))
My reasoning is that, I can have a repetition of either escaped quotes "\"" or characters not equal to ".
But it's not working as I expected:
ghci> runParser parseStringJValue "\"foobar\""
Nothing
I don't know what parser combinator library you are using, but here is a working example using Parsec. I'm using monadic style to make it clearer what's going on, but it is easily translated to applicative style.
import Text.Parsec
import Text.Parsec.String
jchar :: Parser Char
jchar = escaped <|> anyChar
escaped :: Parser Char
escaped = do
char '\\'
c <- oneOf ['"', '\\', 'r', 't' ] -- etc.
return $ case c of
'r' -> '\r'
't' -> '\t'
_ -> c
jstringLiteral :: Parser String
jstringLiteral = do
char '"'
cs <- manyTill jchar (char '"')
return cs
test1 = parse jstringLiteral "" "\"This is a test\""
test2 = parse jstringLiteral "" "\"This is an embedded quote: \\\" after quote\""
test3 = parse jstringLiteral "" "\"Embedded return: \\r\""
Note the extra level of backslashes needed to represent parser input as Haskell string literals. Reading the input from a file would make creating the parser input more convenient.
The definition of the manyTill combinator is:
manyTill p end = scan
where
scan = do{ end; return [] }
<|>
do{ x <- p; xs <- scan; return (x:xs) }
and this might help you figure out why your definitions aren't working.

End of file unexpected in haskell

I think I have researched very hard about my problem so here I am.
I have a "end of file unexpected" error at line
6 colunm 33.
I have already tried many solutions to resolve my problem.
Here is my code and the file I am trying to parse.
Here is the text I am trying to parse :
ifc.txt :
#9512= IFCBUILDINGSTOREY('3y21AUC9X4yAqzLGUny16E',#16,'Story',$,$,#9509,$,$,.ELEMENT.,6200.);
#9509= IFCLOCALPLACEMENT(#115,#9506);
#9506= IFCAXIS2PLACEMENT3D(#9502,#9498,#9494);
#9502= IFCCARTESIANPOINT((0.,0.,6200.));
#9498= IFCDIRECTION((0.,0.,1.));
#9494= IFCDIRECTION((1.,0.,0.));
Here is the code :
code.hs :
import Text.ParserCombinators.Parsec
main = do
f <- readFile "ifc.txt"
let m = (parse ifc "" f)
print m
ifc :: Parser IfcModel
ifc = many ifcentry
ifcentry = do
string "#"
i <- idt
string "= "
name <- idt
string "("
prop <- idt
string ")"
string ";"
string "\n"
return (i,name,prop)
idt = many (letter <|> digit <|> char ','
<|> char '$' <|> char ')' <|> char '\''
<|> char '=' <|> char ';' <|> char '\n'
<|> char ' ' <|> char '(' <|> char '#'
<|> char '.' <|> char '\r')
Thanks for your help, i should have checked a bit earlier my anwser because i worked on my own and i found asolution i will post it when i can (8hours left for a newbie like me who has less than 10 in reputation).
Thanks again.
Solution: use sepBy instead of including the newline in ifcentry
Your ifcentry expects a newline at the end, and your input doesn't have one, which is why the EOF was unexpected.
Drop the string "\n" from ifcentry and instead define
ifc :: Parser IfcModel
ifc = ifcentry `sepBy` (char '\n')
Also, your idt parser is needlessly long. It would be clearer as
idt = many (letter <|> digit <|> oneOf ".,;' =#$()\n\r")
Clearer ifcentry
And while I'm at it, I'd write
ifcentry = do
char '#'
i <- idt
string "= "
name <- idt
prop <- parens idt
char ';'
return (i,name,prop)
Because parens (which parses an open bracket, your idt content, then a close bracket) tidies it up and makes it clearer.
Less verbose main
I'd also write
main = fmap (parse ifc "") (readFile "ifc.txt") >>= print
certainly there's no need for
let m = (parse ifc "" f)
print m
because you may as well do
print (parse ifc "" f)
In addtion to #enough rep to comment's answer
I would go much further and declare something in the line of
data IFCType = IFCBuildingStorey ....
| IFCLocalPlacement ....
| IFCAxis2Placement3D ....
| IFCCartesianpoint Double Double Double
| IFCDirection ....
deriving Show
and
type ID = Integer
type IFCElement = (ID,IFCType)
where i will show the CartesianPoint as an example
ifctype :: Parser IFCType
ifctype = do string "IFC"
buildingStorey
<|> localPlacement
<|> axis2Placement3D
<|> cartesianpoint
<|> direction
buildingStorey :: Parser IFCType
buildingStorey = do string "BUILDINGSTOREY"
return IFCBuildingStorey
localPlacement :: Parser IFCType
localPlacement = do string "LOCALPLACEMENT"
return IFCLocalPlacement
axis2Placement3D :: Parser IFCType
axis2Placement3D = do string "AXIS2PLACEMENT3D"
return IFCAxis2Placement3D
cartesianpoint :: Parser IFCType
cartesianpoint = do string "CARTESIANPOINT"
char '('
char '('
x <- double
char ','
y <- double
char ','
z <- double
char ')'
char ')'
return $ IFCCartesianpoint x y z
double :: Parser Double
double = do d <- many1 (digit <|> char '.')
return $ read d
direction :: Parser IFCType
direction = do string "DIRECTION"
return IFCDirection
this has the additional advantage that you have typed models.
Thanks for your help everyone, i should have checked a bit earlier my anwser because i worked on my own and finally found a solution :
import Text.ParserCombinators.Parsec
main = do
f <- readFile "ifc.txt"
let m = (parse ifc "" f)
print m
type IfcIdt = String
type IfcName = String
type IfcProp = [String]
type IfcModel = [(IfcIdt,IfcName,IfcProp)]
ifc :: Parser IfcModel
ifc = many ifcentry
ifcentry = do
string "#"
i <- idtnumber
string "= "
name <- idtname
opening
prop <- ifcprop
closing
eol
return (i,name,prop)
idtnumber = many digit
idtname = many (letter <|> digit)
ifcprop = sepBy prop (char ',')
prop = many (noneOf "=,();\n")
eol = string ";\n"
opening = try (string "((")
<|> string "("
closing = try (string "))")
<|> string ")"

Classify lexeme of a input String

given a String "3 + a * 6" how do I determine the lexeme one by one? I know that my code is missing classify xs part but I don't know where to put it. Can anyone help me with this?
(the language is in Haskell)
classify :: String -> String
classify (x:xs)
|x == '+' = "PLUS"
|x == '-' = "MINUS"
|x == '*' = "MULT"
|x == '/' = "DIV"
|x == '(' = "LP"
|x == ')' = "RP"
|isAlpha x = "VAR"
|isDigit x = "CONST"
|otherwise = error "Cannot determine lexeme"
This kind of tokenisation is best left to lexer generators or parser combinators. You can try Alex, at http://www.haskell.org/alex/ , or Parsec, at http://www.haskell.org/haskellwiki/Parsec .
These tools are designed specifically to make tokenisation/scanning (and parsing, in the case of Parsec) easy to use.
If you really only need a tokenizer, here's how you could do it without parsec. I defined an additional ADT for the token types (you can of course convert that back to strings), and had to change the return type, since you get a sequence of tokens.
type Error = String
data Token = Plus | Minus | Mult | Div | Lp | Rp
| Var | Const | Whitespace deriving (Show, Eq)
tokenTable = [('+', Plus), ('-', Minus), ('*', Mult), ('/', Div), ('(', Lp), (')', Rp)]
tokenize :: String -> Either Error [Token]
tokenize "" = Right []
tokenize (x:xs) = case lookup x tokenTable of
Just t -> fmap (t:) (tokenize xs)
Nothing -> recognize x where
recognize x
| isAlpha x = fmap (Var:) (tokenize xs)
| isDigit x = fmap (Const:) (tokenize xs)
| isSeparator x = fmap (Whitespace:) (tokenize xs)
| otherwise = Left "Cannot determine lexeme"
However, this quickly becomes tedious. It already is, somehow, since we have to lift the list consing to Either using fmap. Imagine how you would implement indicating the location of the error? Going further essentialy becomes implementing a monad stack and reimplementing a parser combinator like Parsec. That's why it's often recomminded to use a combinator library directly, and also let it do the lexing.
And if you can't or don't want to use full Parsec, it's not too difficult to implement the basic functionality by yourself.
You don't need to parse spaces in general. Here is a combination of your and phg's solutions:
import Data.Char
data Token = Plus | Minus | Mult | Div | Lp | Rp | Var | Digit | Undefined
deriving Show
tokenMap :: String -> Token
tokenMap "+" = Plus
tokenMap "-" = Minus
tokenMap "*" = Mult
tokenMap "/" = Div
tokenMap "(" = Lp
tokenMap ")" = Rp
tokenMap [c]
| isAlpha c = Var
| isDigit c = Digit
tokenMap _ = Undefined
classify :: String -> [Token]
classify = map tokenMap . words

Resources