It would seem that Hamlet's $case expression should be remarkably useful, but I can't figure out how one would match against an record type with multiple constructors short of pattern matching (with a unique name) each of the fields. Say I have a data type,
data A = A1 { v1,v2,v3 :: Int }
| A2 { g :: Double}
In my template, I would want to render A1 values differently from A2 values. One would think I could simply do,
$case myA
$of a#(A1 {})
<p>This is an A1: #{show $ v1 a}
$of a#(A2 {})
<p>This is an A2: #{show $ g a}
Unfortunately, this snippet fails to compile with a syntax error, suggesting that the # syntax isn't supported. If I remove the a#, I get another syntax error, this time suggesting that the record brace notation also isn't supported.
Finally, in desperation, once can try,
$case myA
$of A1 _ _ _
...
But alas, even this doesn't compile (conflicting definitions of _). Consequently, it seems that the only option is,
$case myA
$of A1 v1 v2 v3
...
This sort of order-based pattern matching gets extremely tiresome with large datatypes, especially when one is forced to name every field.
So, what am I missing here? Is case analysis in Hamlet really as limited as it seems? What is the recommended way to match against the constructors of a ADT (and later refer to fields)? Is the fact that I even want to do this sort of matching a sign that I'm Doing It Wrong(TM)?
You can track hamlet processing.
The answer is in the non-exposed module Text.Hamlet.Parse where
controlOf = do
_ <- try $ string "$of"
pat <- many1 $ try $ spaces >> ident
_ <- spaceTabs
eol
return $ LineOf pat
where
ident = Ident <$> many1 (alphaNum <|> char '_' <|> char '\'')
so only a sequence of one or more (spaces followed by (identifier or wildcard)) is accepted.
You may extend it from here.
Cheers!
Related
I'm attempting to parse permutations of flags. The behavior I want is "one or more flags in any order, without repetition". I'm using the following packages:
megaparsec
parser-combinators
The code I have is outputting what I want, but is too lenient on inputs. I don't understand why it's accepting multiples of the same flags. What am I doing wrong here?
pFlags :: Parser [Flag]
pFlags = runPermutation $ f <$>
toPermutation (optional (GroupFlag <$ char '\'')) <*>
toPermutation (optional (LeftJustifyFlag <$ char '-'))
where f a b = catMaybes [a, b]
Examples:
"'-" = [GroupFlag, LeftJustifyFlag] -- CORRECT
"-'" = [LeftJustifyFlag, GroupFlag] -- CORRECT
"''''-" = [GroupFlag, LeftJustifyFlag] -- INCORRECT, should fail if there's more than one of the same flag.
Instead of toPermutation with optional, I believe you need to use toPermutationWithDefault, something like this (untested):
toPermutationWithDefault Nothing (Just GroupFlag <$ char '\'')
The reasoning is described in the paper “Parsing Permutation Phrases” (PDF) in §4, “adding optional elements” (emph. added):
Consider, for example […] all permutations of a, b and c. Suppose b can be empty and we want to recognise ac. This can be done in three different ways since the empty b can be recognised before a, after a or after c. Fortunately, it is irrelevant for the result of a parse where exactly the empty b is derived, since order is not important. This allows us to use a strategy similar to the one proposed by Cameron: parse nonempty constituents as they are seen and allow the parser to stop if all remaining elements are optional. When the parser stops the default values are returned for all optional elements that have not been recognised.
To implement this strategy we need to be able to determine whether a parser can derive the empty string and split it into its default value and its non-empty part, i.e. a parser that behaves the same except that it does not recognise the empty string.
That is, the permutation parser needs to know which elements can succeed without consuming input, otherwise it will be too eager to commit to a branch. I don’t know why this would lead to accepting multiples of an element, though; perhaps you’re also missing an eof?
I have recently started learning Haskell and have been trying my hand at Parsec. However, for the past couple of days I have been stuck with a problem that I have been unable to find the solution to. So what I am trying to do is write a parser that can parse a string like this:
<"apple", "pear", "pineapple", "orange">
The code that I wrote to do that is:
collection :: Parser [String]
collection = (char '<') *> (string `sepBy` char ',')) <* (char '>')
string :: Parser String
string = char '"' *> (many (noneOf ['\"', '\r', '\n', '"'])) <* char '"'
This works fine for me as it is able to parse the string that I have defined above. Nevertheless, I would now like to enforce the rule that every element in this collection must be unique and that is where I am having trouble. One of the first results I found when searching on the internet was this one, which suggest the usage of the nub function. Although the problem stated in that question is not the same, it would in theory solve my problem. But what I don't understand is how I can apply this function within a Parser. I have tried adding the nub function to several parts of the code above without any success. Later I also tried doing it the following way:
collection :: Parser [String]
collection = do
char '<'
value <- (string `sepBy` char ','))
char '>'
return nub value
But this does not work as the type does not match what nub is expecting, which I believe is one of the problems I am struggling with. I am also not entirely sure whether nub is the right way to go. My fear is that I am going in the wrong direction and that I won't be able to solve my problem like this. Is there perhaps something I am missing? Any advice or help anyone could provide would be greatly appreciated.
The Parsec Parser type is an instance of MonadPlus which means that we can always fail (ie cause a parse error) whenever we want. A handy function for this is guard:
guard :: MonadPlus m => Bool -> m ()
This function takes a boolean. If it's true, it return () and the whole computation (a parse in this case) does not fail. If it's false, the whole thing fails.
So, as long as you don't care about efficiency, here's a reasonable approach: parse the whole list, check for whether all the elements are unique and fail if they aren't.
To do this, the first thing we have to do is write a predicate that checks if every element of a list is unique. nub does not quite do the right thing: it return a list with all the duplicates taken out. But if we don't care much about performance, we can use it to check:
allUnique ls = length (nub ls) == length ls
With this predicate in hand, we can write a function unique that wraps any parser that produces a list and ensures that list is unique:
unique parser = do res <- parser
guard (allUnique res)
return res
Again, if guard is give True, it doesn't affect the rest of the parse. But if it's given False, it will cause an error.
Here's how we could use it:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\">"
Right ["apple","pear","pineapple","orange"]
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):unknown parse error
This does what you want. However, there's a problem: there is no error message supplied. That's not very user friendly! Happily, we can fix this using <?>. This is an operator provided by Parsec that lets us set the error message of a parser.
unique parser = do res <- parser
guard (allUnique res) <?> "unique elements"
return res
Ahhh, much better:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):
expecting unique elements
All this works but, again, it's worth noting that it isn't efficient. It parses the whole list before realizing elements aren't unique, and nub takes quadratic time. However, this works and it's probably more than good enough for parsing small to medium-sized files: ie most things written by hand rather than autogenerated.
For instance:
let x = 1 in putStrLn [dump|x, x+1|]
would print something like
x=1, (x+1)=2
And even if there isn't anything like this currently, would it be possible to write something similar?
TL;DR There is this package which contains a complete solution.
install it via cabal install dump
and/or
read the source code
Example usage:
{-# LANGUAGE QuasiQuotes #-}
import Debug.Dump
main = print [d|a, a+1, map (+a) [1..3]|]
where a = 2
which prints:
(a) = 2 (a+1) = 3 (map (+a) [1..3]) = [3,4,5]
by turnint this String
"a, a+1, map (+a) [1..3]"
into this expression
( "(a) = " ++ show (a) ++ "\t " ++
"(a+1) = " ++ show (a + 1) ++ "\t " ++
"(map (+a) [1..3]) = " ++ show (map (+ a) [1 .. 3])
)
Background
Basically, I found that there are two ways to solve this problem:
Exp -> String The bottleneck here is pretty-printing haskell source code from Exp and cumbersome syntax upon usage.
String -> Exp The bottleneck here is parsing haskell to Exp.
Exp -> String
I started out with what #kqr put together, and tried to write a parser to turn this
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
into this
["not x = False","x = True","x == True = True"]
But after trying for a day, my parsec-debugging-skills have proven insufficient to date, so instead I went with a simple regular expression:
simplify :: String -> String
simplify s = subRegex (mkRegex "_[0-9]+|([a-zA-Z]+\\.)+") s ""
For most cases, the output is greatly improved.
However, I suspect this to likely mistakenly remove things it shouldn't.
For example:
$(dump [|(elem 'a' "a.b.c", True)|])
Would likely return:
["elem 'a' \"c\" = True","True = True"]
But this could be solved with proper parsing.
Here is the version that works with the regex-aided simplification: https://github.com/Wizek/kqr-stackoverflow/blob/master/Th.hs
Here is a list of downsides / unresolved issues I've found with the Exp -> String solution:
As far as I know, not using Quasi Quotation requires cumbersome syntax upon usage, like: $(d [|(a, b)|]) -- as opposed to the more succinct [d|a, b|]. If you know a way to simplify this, please do tell!
As far as I know, [||] needs to contain fully valid Haskell, which pretty much necessitates the use of a tuple inside further exacerbating the syntactic situation. There is some upside to this too, however: at least we don't need to scratch our had where to split the expressions since GHC does that for us.
For some reason, the tuple only seemed to accept Booleans. Weird, I suspect this should be possible to fix somehow.
Pretty pretty-printing Exp is not very straight-forward. A more complete solution does require a parser after all.
Printing an AST scrubs the original formatting for a more uniform looks. I hoped to preserve the expressions letter-by-letter in the output.
The deal-breaker was the syntactic over-head. I knew I could get to a simpler solution like [d|a, a+1|] because I have seen that API provided in other packages. I was trying to remember where I saw that syntax. What is the name...?
String -> Exp
Quasi Quotation is the name, I remember!
I remembered seeing packages with heredocs and interpolated strings, like:
string = [qq|The quick {"brown"} $f {"jumps " ++ o} the $num ...|]
where f = "fox"; o = "over"; num = 3
Which, as far as I knew, during compile-time, turns into
string = "The quick " ++ "brown" ++ " " ++ $f ++ "jumps " ++ o ++ " the" ++ show num ++ " ..."
where f = "fox"; o = "over"; num = 3
And I thought to myself: if they can do it, I should be able to do it too!
A bit of digging in their source code revealed the QuasiQuoter type.
data QuasiQuoter = QuasiQuoter {quoteExp :: String -> Q Exp}
Bingo, this is what I want! Give me the source code as string! Ideally, I wouldn't mind returning string either, but maybe this will work. At this point I still know quite little about Q Exp.
After all, in theory, I would just need to split the string on commas, map over it, duplicate the elements so that first part stays string and the second part becomes Haskell source code, which is passed to show.
Turning this:
[d|a+1|]
into this:
"a+1" ++ " = " ++ show (a+1)
Sounds easy, right?
Well, it turns out that even though GHC most obviously is capable to parse haskell source code, it doesn't expose that function. Or not in any way we know of.
I find it strange that we need a third-party package (which thankfully there is at least one called haskell-src-meta) to parse haskell source code for meta programming. Looks to me such an obvious duplication of logic, and potential source of mismatch -- resulting in bugs.
Reluctantly, I started looking into it. After all, if it is good enough for the interpolated-string folks (those packaged did rely on haskell-src-meta) then maybe it will work okay for me too for the time being.
And alas, it does contain the desired function:
Language.Haskell.Meta.Parse.parseExp :: String -> Either String Exp
Language.Haskell.Meta.Parse
From this point it was rather straightforward, except for splitting on commas.
Right now, I do a very simple split on all commas, but that doesn't account for this case:
[d|(1, 2), 3|]
Which fails unfortunatelly. To handle this, I begun writing a parsec parser (again) which turned out to be more difficult than anticipated (again). At this point, I am open to suggestions. Maybe you know of a simple parser that handles the different edge-cases? If so, tell me in a comment, please! I plan on resolving this issue with or without parsec.
But for the most use-cases: it works.
Update at 2015-06-20
Version 0.2.1 and later correctly parses expressions even if they contain commas inside them. Meaning [d|(1, 2), 3|] and similar expressions are now supported.
You can
install it via cabal install dump
and/or
read the source code
Conclusion
During the last week I've learnt quite a bit of Template Haskell and QuasiQuotation, cabal sandboxes, publishing a package to hackage, building haddock docs and publishing them, and some things about Haskell too.
It's been fun.
And perhaps most importantly, I now am able to use this tool for debugging and development, the absence of which has been bugging me for some time. Peace at last.
Thank you #kqr, your engagement with my original question and attempt at solving it gave me enough spark and motivation to continue writing up a full solution.
I've actually almost solved the problem now. Not exactly what you imagined, but fairly close. Maybe someone else can use this as a basis for a better version. Either way, with
{-# LANGUAGE TemplateHaskell, LambdaCase #-}
import Language.Haskell.TH
dump :: ExpQ -> ExpQ
dump tuple =
listE . map dumpExpr . getElems =<< tuple
where
getElems = \case { TupE xs -> xs; _ -> error "not a tuple in splice!" }
dumpExpr exp = [| $(litE (stringL (pprint exp))) ++ " = " ++ show $(return exp)|]
you get the ability to do something like
λ> let x = True
λ> print $(dump [|(not x, x, x == True)|])
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
which is almost what you wanted. As you see, it's a problem that the pprint function includes module prefixes and such, which makes the result... less than ideally readable. I don't yet know of a fix for that, but other than that I think it is fairly usable.
It's a bit syntactically heavy, but that is because it's using the regular [| quote syntax in Haskell. If one wanted to write their own quasiquoter, as you suggest, I'm pretty sure one would also have to re-implement parsing Haskell, which would suck a bit.
Can I always expect the single single-quote syntax to desugar to the NameG constructor? e.g. does
'x
always desugar to
(Name (OccName "x") (NameG VarName (PkgName "some-package") (ModName "SomeModule")))
This information must always be there, after name resolution, which is the stage Template Haskell runs after, right? And I haven't been able to quote local names, though I'm only interested in quoting top-level names.
Context: I want to write a function that returns the uniquely-qualified identifier. It's a partial function because I can't constrain the input, as Template Haskell doesn't have any GADTs or anything, while I don't want to wrap the output in uncertainty. And I don't want to use a quasi-quoter or splice, if ' will do. I want to prove that this partial function is safe at runtime when used as above, quoting top-level names in the same module, given:
name (Name occ (NameG _ pkg mod)) = Unique occ pkg mod
I want to have a function like:
(<=>) :: Name -> a -> Named a
given:
data Named a = Named a Unique
to annotate variable bindings:
x = 'x
<=> ...
without the user needing to use the heavy splice syntax $(name ...), and invoke splicing at compile time:
x = $(name 'x)
<=> ...
The user will be writing a lot of these, for configuration.
https://downloads.haskell.org/~ghc/7.8.3/docs/html/users_guide/template-haskell.html and https://hackage.haskell.org/package/template-haskell-2.8.0.0/docs/src/Language-Haskell-TH-Syntax.html#Name didn't say.
(p.s. I'd also like to know if the double single-quote syntax (e.g. ''T) had the analogous guarantee, though I'd expect them to be the same).
Since ' quoted names are known at compile time, why don't you change name to be in the Q monad:
name :: Name -> ExpQ
name (Name occ (NameG _ pkg mod)) = [| Unique occ pkg mod |]
name n = fail $ "invalid name: "++ gshow n
Then you use $(name 'show) :: Unique instead of name 'show :: Unique. If you get an invalid Name (say somebody uses mkName), that failure will show up at compile time.
I am trying to parse sgf files (files that describe games of go). In these files there are key value pairs of the form (in the sgf specifiction they are called property id's and property values, but i am using key and value in the hope that people will know what I am talking about when reading the title):
key[value]
or
key[value1][value2]...[valuen]
That is, there might be 1 or many values. The catch is that the type of the value depends in the key. So for example, if the key is B (for play a black stone in go). The value is supposed to be a coordinate described by two letters for example: B[ab]. It might also be that the key is AB (for adding a number of black stones, for setting up the board), then the value is a list of coordinates like this: AB[ab][cd][fg]. It could also be that the key is C (for a comment). Then the value is just a string C[this is a comment].
Of course this could be described by the type
type Property = (String, [String])
But i think it would be nicer to have something like
data Property = B Coordinate | AB [Coordinate] | C String ...
Or maybe some other type that better utilizes the type system and won't require that I convert to and from strings all the time.
The problem is that then I would need a parser that depending on the key type returns a different value type, but I think that would cause type problems since a parser can only return one type of value.
How would you parse something like this?
This is actually a straightforward choice, and doesn't need monadic parsing. I'll use the applicative interface to demonstrate this point.
Build a parser for each property id and its property values somewhat like this:
black = B <$> (char 'C' *> coordinate)
white = W <$> (char 'W' *> coordinate)
addBlack = AB <$> (string "AB" *> many1 coordinate)
(assuming you've built a coordinate parser that eats the brackets and returns something of type Coordinate).
Each one of those has type Parser Property (with your second, better structured data type), so now we just get the parser to choose between them. If the property ids all have different first letters, when you use the parser for the wrong id, it will fail without consuming input, which is ideal for the choice operator:
myparser = black <|> white <|> addBlack
But I suspect there's an AW id for adding white stones, so we'd need to warn that they overlap using try, which backtracks when a parser fails:
mybetterparser = black <|> white <|> (try addBlack <|> try addWhite)
I've bracketed together parsers with a common start and used try on them to go back to the beginning.