I am unsure on how to use setPosition (Parsec library). Here is an extremely simple piece of code which should read the first 3 characters of the second line of a text.
import Text.ParserCombinators.Parsec
content = ["This is the first line",
"and this is the second one",
"not to talk about the third one"]
txt = unlines content
main = parseTest myPar txt
myPar = getPosition >>= \oldPos ->
let newPos = setSourceLine oldPos 2 in
setPosition newPos >>
count 3 anyChar
Still, the output is "Thi" and not "and" as I would excpect... I feel I am missing somethning very simple, but alas, I don't know what; can you help me?
The setPosition function changes what position Parsec reports for errors, but does not affect where in the stream of tokens the parsing actually is. It is used as a primitive for back-ends that need to do fancy things: preprocessors that must report positions in other files, parsers that operate on streams of non-Char tokens, and so forth.
Related
I have recently started learning Haskell and have been trying my hand at Parsec. However, for the past couple of days I have been stuck with a problem that I have been unable to find the solution to. So what I am trying to do is write a parser that can parse a string like this:
<"apple", "pear", "pineapple", "orange">
The code that I wrote to do that is:
collection :: Parser [String]
collection = (char '<') *> (string `sepBy` char ',')) <* (char '>')
string :: Parser String
string = char '"' *> (many (noneOf ['\"', '\r', '\n', '"'])) <* char '"'
This works fine for me as it is able to parse the string that I have defined above. Nevertheless, I would now like to enforce the rule that every element in this collection must be unique and that is where I am having trouble. One of the first results I found when searching on the internet was this one, which suggest the usage of the nub function. Although the problem stated in that question is not the same, it would in theory solve my problem. But what I don't understand is how I can apply this function within a Parser. I have tried adding the nub function to several parts of the code above without any success. Later I also tried doing it the following way:
collection :: Parser [String]
collection = do
char '<'
value <- (string `sepBy` char ','))
char '>'
return nub value
But this does not work as the type does not match what nub is expecting, which I believe is one of the problems I am struggling with. I am also not entirely sure whether nub is the right way to go. My fear is that I am going in the wrong direction and that I won't be able to solve my problem like this. Is there perhaps something I am missing? Any advice or help anyone could provide would be greatly appreciated.
The Parsec Parser type is an instance of MonadPlus which means that we can always fail (ie cause a parse error) whenever we want. A handy function for this is guard:
guard :: MonadPlus m => Bool -> m ()
This function takes a boolean. If it's true, it return () and the whole computation (a parse in this case) does not fail. If it's false, the whole thing fails.
So, as long as you don't care about efficiency, here's a reasonable approach: parse the whole list, check for whether all the elements are unique and fail if they aren't.
To do this, the first thing we have to do is write a predicate that checks if every element of a list is unique. nub does not quite do the right thing: it return a list with all the duplicates taken out. But if we don't care much about performance, we can use it to check:
allUnique ls = length (nub ls) == length ls
With this predicate in hand, we can write a function unique that wraps any parser that produces a list and ensures that list is unique:
unique parser = do res <- parser
guard (allUnique res)
return res
Again, if guard is give True, it doesn't affect the rest of the parse. But if it's given False, it will cause an error.
Here's how we could use it:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\">"
Right ["apple","pear","pineapple","orange"]
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):unknown parse error
This does what you want. However, there's a problem: there is no error message supplied. That's not very user friendly! Happily, we can fix this using <?>. This is an operator provided by Parsec that lets us set the error message of a parser.
unique parser = do res <- parser
guard (allUnique res) <?> "unique elements"
return res
Ahhh, much better:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):
expecting unique elements
All this works but, again, it's worth noting that it isn't efficient. It parses the whole list before realizing elements aren't unique, and nub takes quadratic time. However, this works and it's probably more than good enough for parsing small to medium-sized files: ie most things written by hand rather than autogenerated.
For instance:
let x = 1 in putStrLn [dump|x, x+1|]
would print something like
x=1, (x+1)=2
And even if there isn't anything like this currently, would it be possible to write something similar?
TL;DR There is this package which contains a complete solution.
install it via cabal install dump
and/or
read the source code
Example usage:
{-# LANGUAGE QuasiQuotes #-}
import Debug.Dump
main = print [d|a, a+1, map (+a) [1..3]|]
where a = 2
which prints:
(a) = 2 (a+1) = 3 (map (+a) [1..3]) = [3,4,5]
by turnint this String
"a, a+1, map (+a) [1..3]"
into this expression
( "(a) = " ++ show (a) ++ "\t " ++
"(a+1) = " ++ show (a + 1) ++ "\t " ++
"(map (+a) [1..3]) = " ++ show (map (+ a) [1 .. 3])
)
Background
Basically, I found that there are two ways to solve this problem:
Exp -> String The bottleneck here is pretty-printing haskell source code from Exp and cumbersome syntax upon usage.
String -> Exp The bottleneck here is parsing haskell to Exp.
Exp -> String
I started out with what #kqr put together, and tried to write a parser to turn this
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
into this
["not x = False","x = True","x == True = True"]
But after trying for a day, my parsec-debugging-skills have proven insufficient to date, so instead I went with a simple regular expression:
simplify :: String -> String
simplify s = subRegex (mkRegex "_[0-9]+|([a-zA-Z]+\\.)+") s ""
For most cases, the output is greatly improved.
However, I suspect this to likely mistakenly remove things it shouldn't.
For example:
$(dump [|(elem 'a' "a.b.c", True)|])
Would likely return:
["elem 'a' \"c\" = True","True = True"]
But this could be solved with proper parsing.
Here is the version that works with the regex-aided simplification: https://github.com/Wizek/kqr-stackoverflow/blob/master/Th.hs
Here is a list of downsides / unresolved issues I've found with the Exp -> String solution:
As far as I know, not using Quasi Quotation requires cumbersome syntax upon usage, like: $(d [|(a, b)|]) -- as opposed to the more succinct [d|a, b|]. If you know a way to simplify this, please do tell!
As far as I know, [||] needs to contain fully valid Haskell, which pretty much necessitates the use of a tuple inside further exacerbating the syntactic situation. There is some upside to this too, however: at least we don't need to scratch our had where to split the expressions since GHC does that for us.
For some reason, the tuple only seemed to accept Booleans. Weird, I suspect this should be possible to fix somehow.
Pretty pretty-printing Exp is not very straight-forward. A more complete solution does require a parser after all.
Printing an AST scrubs the original formatting for a more uniform looks. I hoped to preserve the expressions letter-by-letter in the output.
The deal-breaker was the syntactic over-head. I knew I could get to a simpler solution like [d|a, a+1|] because I have seen that API provided in other packages. I was trying to remember where I saw that syntax. What is the name...?
String -> Exp
Quasi Quotation is the name, I remember!
I remembered seeing packages with heredocs and interpolated strings, like:
string = [qq|The quick {"brown"} $f {"jumps " ++ o} the $num ...|]
where f = "fox"; o = "over"; num = 3
Which, as far as I knew, during compile-time, turns into
string = "The quick " ++ "brown" ++ " " ++ $f ++ "jumps " ++ o ++ " the" ++ show num ++ " ..."
where f = "fox"; o = "over"; num = 3
And I thought to myself: if they can do it, I should be able to do it too!
A bit of digging in their source code revealed the QuasiQuoter type.
data QuasiQuoter = QuasiQuoter {quoteExp :: String -> Q Exp}
Bingo, this is what I want! Give me the source code as string! Ideally, I wouldn't mind returning string either, but maybe this will work. At this point I still know quite little about Q Exp.
After all, in theory, I would just need to split the string on commas, map over it, duplicate the elements so that first part stays string and the second part becomes Haskell source code, which is passed to show.
Turning this:
[d|a+1|]
into this:
"a+1" ++ " = " ++ show (a+1)
Sounds easy, right?
Well, it turns out that even though GHC most obviously is capable to parse haskell source code, it doesn't expose that function. Or not in any way we know of.
I find it strange that we need a third-party package (which thankfully there is at least one called haskell-src-meta) to parse haskell source code for meta programming. Looks to me such an obvious duplication of logic, and potential source of mismatch -- resulting in bugs.
Reluctantly, I started looking into it. After all, if it is good enough for the interpolated-string folks (those packaged did rely on haskell-src-meta) then maybe it will work okay for me too for the time being.
And alas, it does contain the desired function:
Language.Haskell.Meta.Parse.parseExp :: String -> Either String Exp
Language.Haskell.Meta.Parse
From this point it was rather straightforward, except for splitting on commas.
Right now, I do a very simple split on all commas, but that doesn't account for this case:
[d|(1, 2), 3|]
Which fails unfortunatelly. To handle this, I begun writing a parsec parser (again) which turned out to be more difficult than anticipated (again). At this point, I am open to suggestions. Maybe you know of a simple parser that handles the different edge-cases? If so, tell me in a comment, please! I plan on resolving this issue with or without parsec.
But for the most use-cases: it works.
Update at 2015-06-20
Version 0.2.1 and later correctly parses expressions even if they contain commas inside them. Meaning [d|(1, 2), 3|] and similar expressions are now supported.
You can
install it via cabal install dump
and/or
read the source code
Conclusion
During the last week I've learnt quite a bit of Template Haskell and QuasiQuotation, cabal sandboxes, publishing a package to hackage, building haddock docs and publishing them, and some things about Haskell too.
It's been fun.
And perhaps most importantly, I now am able to use this tool for debugging and development, the absence of which has been bugging me for some time. Peace at last.
Thank you #kqr, your engagement with my original question and attempt at solving it gave me enough spark and motivation to continue writing up a full solution.
I've actually almost solved the problem now. Not exactly what you imagined, but fairly close. Maybe someone else can use this as a basis for a better version. Either way, with
{-# LANGUAGE TemplateHaskell, LambdaCase #-}
import Language.Haskell.TH
dump :: ExpQ -> ExpQ
dump tuple =
listE . map dumpExpr . getElems =<< tuple
where
getElems = \case { TupE xs -> xs; _ -> error "not a tuple in splice!" }
dumpExpr exp = [| $(litE (stringL (pprint exp))) ++ " = " ++ show $(return exp)|]
you get the ability to do something like
λ> let x = True
λ> print $(dump [|(not x, x, x == True)|])
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
which is almost what you wanted. As you see, it's a problem that the pprint function includes module prefixes and such, which makes the result... less than ideally readable. I don't yet know of a fix for that, but other than that I think it is fairly usable.
It's a bit syntactically heavy, but that is because it's using the regular [| quote syntax in Haskell. If one wanted to write their own quasiquoter, as you suggest, I'm pretty sure one would also have to re-implement parsing Haskell, which would suck a bit.
I'm trying to write a function in Haskell which checks for some things and then recurses based on some minimal user input. In order to do that I think I have to use do blocks.
cip :: [Argument] -> [Argument] -> Bool -> Bool -> IO()
cip (a:args) pargs burden gameover = do
let nasko = a:pargs
putStrLn (getPremise a)
let newgraph = Carneades.mkArgGraph nasko
let newcaes = (CAES (newgraph,audience2,assStandarts))
let answer = (acceptable (mkProp (getPremise a)) newcaes )
print answer
if(answer==True)
then (cip args nasko burden gameover)
else do
print "One of the arguments is not proved. Here are the premises that need proving"
print (propsForFixing newcaes a)
print "Let's see what you have for the first Propositon"
--add an if to check if no applicable arguments.
print (argumentScanHelp (head (propsForFixing newcaes a)) args)
print "\n Would you like me to apply the firt one? Y/N"
choice <- getLine
if(choice=="Y") then do print "applying the argument"
let applicabee = head (argumentScanHelp (head (propsForFixing newcaes a)) args)
print "Argument targeted"
let newargs = delete applicabee args
let newpargs = applicabee:nasko
print "Argument applied sucsessfuly. Recusing again"
(cip newargs newpargs burden gameover)
return()
It hurts my eyes just by looking at it, but that's do blocks for you.
Everything up to the third do block is okay. But then on this line:
if(choice=="Y") then do print "applying the argument"
let applicabee = head (argumentScanHelp (head (propsForFixing newcaes a)) args)
The complier starts crying:
Main.hs:209:73: parse error on input `let'
Tried a all kinds of different indentations but I can't seem to get it to work.
I don't want to use separate functions, because that means I'll have to pass a lot of arguments around constantly.
Can anyone help me get it right? Also an explanation of what exactly the specifications for nested do block are would be greatly appreciated.
The cause of the error I believe is the misuse of the if expression. You use it as if it were an if statement that exists in most imperative languages. Simply put there must always be an else.
However, in do blocks it makes sense to "not have an else", something like an if statement without an else. Luckily the Control.Monad module will provide you with a function for exactly that:
import Control.Monad (when)
(...)
when (choice=="Y") $ do print "applying the argument"
let applicabee = ...
You seem to already use nested do blocks in the correct way which is good, which basically is that you must indent properly.
PS. Also make sure your last return () is indented like the rest of your code! DS.
I've decided to teach myself Haskell and I have never been so frustrated in all my life. I am going through the tutorial at http://lisperati.com/haskell/ which is the easiest one I could find. All that I am trying to do is read a text file called people.txt which contains a list of numbers and print the length of the list. This code is straight from the tutorial.
import Data.List
type Person = [Int]
main = do
people_text <- readFile "people.txt"
let people :: [Person]
people = read people_text
putStr "Number of people "
putStr (length people_text)
When I try to run the file with runHaskell tutorial03.hs I get this error message
tutorial03.hs:9:13:
Illegal signature in pattern: [Person] people
Use -XScopedTypeVariables to permit it
Using the XScopedTypeVariables flag I get
tutorial03.hs:10:17: Not in scope: type variable `people'
Could someone please explain what I am doing wrong.
Luqui's right that indentation is the issue. The compiler is treating your definition as if it were
let people :: [Person] people = read people_text
which does indeed look like you're writing a type signature in a pattern (and using people both as the function name and a pattern variable, to boot -- peculiar but permitted!).
Crucially, let is a layout keyword, introducing a block of lines which should all be indented to the same horizontal position. By following the signature with a more-indented line, you indicate that you're contuing the line with the signature, rather than starting a new line for the actual definition. If you don't like this fussy layout convention, you can use noisy semicolons.
If you want your definition to be treated as two lines, you either need to be careful to line up the peoples vertically...
let people :: [Person]
people = read people_text
or to signal the line ending explicitly with a semicolon.
let people :: [Person] ;
people = read people_text
The former would be preferable, although I expect most Haskellers would just supply a type annotation for read people_text rather than a signature for the definition, like this:
let people = read people_text :: [Person]
Once you've fixed that, you'll need to contend with the fact that the length of the list is a number, but putStr outputs strings. The print command may be more useful for this purpose.
change it to this, works for me:
...
let people :: [Person]
people = read people_text
...
print (length people_text)
Am still new to Haskell, so apologize if there is an obvious answer to this...
I would like to make a function that splits up the all following lists of strings i.e. [String]:
["int x = 1", "y := x + 123"]
["int x= 1", "y:= x+123"]
["int x=1", "y:=x+123"]
All into the same string of strings i.e. [[String]]:
[["int", "x", "=", "1"], ["y", ":=", "x", "+", "123"]]
You can use map words.lines for the first [String].
But I do not know any really neat ways to also take into account the others - where you would be using the various sub-strings "=", ":=", "+" etc. to break up the main string.
Thank you for taking the time to enlighten me on Haskell :-)
The Prelude comes with a little-known handy function called lex, which is a lexer for Haskell expressions. These match the form you need.
lex :: String -> [(String,String)]
What a weird type though! The list is there for interfacing with a standard type of parser, but I'm pretty sure lex always returns either 1 or 0 elements (0 indicating a parse failure). The tuple is (token-lexed, rest-of-input), so lex only pulls off one token. So a simple way to lex a whole string would be:
lexStr :: String -> [String]
lexStr "" = []
lexStr s =
case lex s of
[(tok,rest)] -> tok : lexStr rest
[] -> error "Failed lex"
To appease the pedants, this code is in terrible form. An explicit call to error instead of returning a reasonable error using Maybe, assuming lex only returns 1 or 0 elements, etc. The code that does this reliably is about the same length, but is significantly more abstract, so I spared your beginner eyes.
I would take a look at parsec and build a simple grammar for parsing your strings.
how about using words .)
words :: String -> [String]
and words wont care for whitespaces..
words "Hello World"
= words "Hello World"
= ["Hello", "World"]