How can I easily express that I don't care about a value of a particular data field? - haskell

I was writing tests for my parser, using a method which might not be the best, but has been working for me so far. The tests assumed perfectly defined AST representation for every code block, like so:
(parse "x = 5") `shouldBe` (Block [Assignment [LVar "x"] [Number 5.0]])
However, when I moved to more complex cases, a need for more "fuzzy" verification arised:
(parse "t.x = 5") `shouldBe` (Block [Assignment [LFieldRef (Var "t") (StringLiteral undefined "x")] [Number 5.0]])
I put in undefined in this example to showcase the field I don't want to be compared to the result of parse (It's a source position of a string literal). Right now the only way of fixing that I see is rewriting the code to make use of shouldSatisfy instead of shouldBe, which I'll have to do if I don't find any other solution.

You can write a normalizePosition function which replaces all the position data in your AST with some fixed dummyPosition value, and then use shouldBe against a pattern built from the same dummy value.
If the AST is very involved, consider writing this normalization using Scrap-Your-Boilerplate.

One way to solve this is to parametrize your AST over source locations:
{-# LANGUAGE DeriveFunctor #-}
data AST a = ...
deriving (Eq, Show, Functor)
Your parse function would then return an AST with SourceLocations:
parse :: String -> AST SourceLocation
As we derived a Functor instance above, we can easily replace source locations with something else, e.g. ():
import Data.Functor ((<$))
parseTest :: String -> AST ()
parseTest input = () <$ parse input
Now, just use parseTest instead of parse in your specs.

Related

How to pattern match an abstract data type when the data constructor isn't in scope?

I'm writing a parser library using Parsec combinators, and I want to unit test some of my parsers. So I have a simple parser:
dash :: GenParser Char st Char
dash = char '-'
I'd like to write some tests for it. The positive test is pretty easy:
spec :: Spec
spec = do
describe "dash" $ do
it "parses a dash" $
parse dash "N/A" "-" `shouldBe` (Right '-')
I'd like to write a negative test as well. When the parser doesn't match, it returns Left of a ParseError. I'd like to write a test that validates the exact message that the ParseError contains. So what I'd really like to do is something like
spec :: Spec
spec = do
describe "dash" $ do
it "doesn't parse an underscore" $
parse dash "N/A" "_" `shouldSatisfy` (hasErrorMessage "not a dash")
hasErrorMessage (Left (ParseError _ msgs)) expected = msg == expected
hasErrorMessage _ expected = False
But I'm having trouble writing this sort of code, since the ParseError data constructor isn't exported from Text.Parsec.Error.
Is there any way to use pattern matching on types where no data constructor for the type is in scope?
I know I could write hasErrorMessage something like
hasErrorMessage :: String -> (Either ParseError a) -> Bool
hasErrorMessage expected (Left pe) = elem expected $ fmap messageString (errorMessages pe)
but I'd like to understand this nuance, too.
Although the data constructor isn't exported, functions to access its parameters are. You can use these in combination with view patterns to sort of get what you want. In your case, the pattern (errorMessages -> msgs) can stand in almost perfectly for (ParseError _ msgs), with two caveats:
You need {-# LANGUAGE ViewPatterns #-} to use this feature.
errorMessages sorts the messages, which a pattern match on the data constructor wouldn't do.
You can even use this technique with pattern synonyms to make a fake data constructor, so you can use the exact same syntax you would otherwise:
{-# LANGUAGE PatternSynonyms, ViewPatterns #-}
pattern ParseError pos msgs <- ((,) <$> errorPos <*> errorMessages -> (pos, msgs)) where
ParseError pos msgs = foldr addErrorMessage (newErrorUnknown pos) msgs

How to combine Megaparsec with Text.Read (derived Read instance)

I want to use the derived instances of Read in the megaparsec module.
How can I use 'Text.Read.read' or 'Text.Read.readEither' in a 'Parser a' ?
It needs not to be fast, but easy to maintain and to extend.
The megaparsec module is for testing my application via CLI, so many different datatypes must be parsed.
It shall work in the following way:
import Text.Megaparsec
readableDatatype :: Read a => Parser a
readableDatatype =
-- This is wrong, but describes how it shall work
-- liftA read chunkToTokens
expr' :: Parser UserControlExpr
expr' = timeExpr
<|> timeEventExpr
<|> digiInExpr
<|> quitExpr
digiInExpr :: Parser UserControlExpr
digiInExpr = do
cmdword "digiIn"
inElement <- (readableDatatype :: Parser TI_I)
return $ UserDigiIn inElement
What do I have to write, so that the three functions typecheck, especially readableDataype ?
You can use getInput :: MonadParsec e s m => m s and setInput :: MonadParsec e s m => s -> m () together with reads :: Read a => String -> [(a, String)] for that. getInput and setInput just get and set the input stream the parser is working on and reads takes a string and returns a list of possible parses together with the remaining unconsumed portions of the input. We also need to tell the parser the new offset in the input, otherwise error locations are wrong. We can do that using getOffset and setOffset.
-- For equality constraint (~)
{-# LANGUAGE TypeFamilies #-}
import Text.Megaparsec
import Text.Read (reads)
readableDatatype :: (Read a, MonadParsec e s m, s ~ String) => m a
readableDatatype = do
input <- getInput
offset <- getOffset
choice $
(\(a, input') -> a <$ setInput input'
<* setOffset (offset + length input - length input'))
<$> reads input
If your input is something other than String you will have to convert between that and String after getInput and before setInput.
The following is about performance concerns, so not really relevant to your problem, but maybe it is educational and it may be useful to others who may need a solution with good performance.
Converting the whole input between String and some other type all the time during parsing is a rather big performance bottleneck for larger input. Furthermore using length to calculate the new offset here is not very performant either.
To solve both of these problems need some way to be able to know how much of the input was actually consumed by the Read-parser, so that we can just drop that part from the original input instead of having to convert the whole unconsumed part back to the original input type. But the Read class does not have that. One could try to parse incrementally longer prefixes of the input, which may be faster in cases where the parses done using Read are short compared to the length of the entire input. You could also use unsafePerformIO to write to an IORef how much of the input was actually forced by the Read-parser which would be the fastest but not so pretty solution.
I implemented the latter here. Feel free to use it, but be aware that it is not very well tested. It does however solve all the problems with the above approach.
That did it. Thank you! In the meantime I made a "conservative" solution of the problem by defining the constructors as strings and parsing them, without using read. That has the advantage, that you got the impressive error message of megaparsec, that tell you what symbols are missing.
Example with read:
1:8:
|
1 | digiIn TI_I_Signal1 DirA Dectivated
| ^
unknown parse error
(only a 'a' was missing in "Deactivated")
example with an hand written parser for the datatype:
1:19:
|
1 | digiIn TI_I_Signal1 Dectivated
| ^^^^^^^^
unexpected "Dectivat"
expecting "active", "inactive", '0', or '1'
I think I will use your code block in future datatypes.
Thank you very much!

Parsec returns [Char] instead of Text

I am trying to create a parser for a custom file format. In the format I am working with, some fields have a closing tag like so:
<SOL>
<DATE>0517
<YEAR>86
</SOL>
I am trying to grab the value between the </ and > and use it as part of the bigger parser.
I have come up with the code below. The trouble is, the parser returns [Char] instead of Text. I can pack each Char by doing fmap pack $ return r to get a text value out, but I was hoping type inference would save me from having to do this. Could someone give hints as to why I am getting back [Char] instead of Text, and how I can get back Text without having to manually pack the value?
{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.Text
import Text.Parsec
import Text.Parsec.Text
-- |A closing tag is on its own line and is a "</" followed by some uppercase characters
-- followed by some '>'
closingTag = do
_ <- char '\n'
r <- between (string "</") (char '>') (many upper)
return r
string has the type
string :: Stream s m Char => String -> ParsecT s u m String
(See here for documentation)
So getting a String back is exactly what's supposed to happen.
Type inference doesn't change types, it only infers them. String is a concrete type, so there's no way to infer Text for it.
What you could do, if you need this in a couple of places, is to write a function
text :: Stream s m Char => String -> ParsecT s u m Text
text = fmap pack . string
or even
string' :: (IsString a, Stream s m Char) => String -> ParsecT s u m a
string' = fmap fromString . string
Also, it doesn't matter in this example but you'd probably want to import Text qualified, names like pack are used in a number of different modules.
As Ørjan Johansen correctly pointed out, string isn't actually the problem here, many upper is. The same principle applies though.
The reason you get [Char] here is that upper parses a Char and many turns that into a [Char]. I would write my own combinator along the lines of:
manyPacked = fmap pack . many
You could probably use type-level programming with type classes etc. to automatically choose between many and manyPack depending on the expect return type, but I don't think that's worth it. (It would probably look a bit like Scala's CanBuiltFrom).

Using Uniplate in two-level tree type

I'm in the beginning stages of writing a parser for a C-like language in Haskell. I've got the AST data type down, and I'm playing around with it by writing some simple queries on the AST itself before I delve into the parser side of things.
My AST revolves around two types: statements (have no value, like an if/else) and expressions (have a value, like a literal or binary operation). So it looks something like this (vastly simplified, of course):
data Statement
= Return Expession
| If Expression Expression
data Expression
= Literal Int
| Variable String
| Binary Expression Op Expression
Say I want to get the names of all variables used in an expression. With uniplate, it's easy:
varsInExpression exp = concat [s | Variable s <- universe exp]
But what if I want to find a list of variables in a statement? In each constructor of Statement, there is a nested Expression that I should apply varsInExpression to. So at the moment, it looks like I'd have to pattern-match against every Statement constructor, which is what uniplate's out to avoid. Am I just not grokking the documentation well enough, or is this a limitation of uniplate (or am I doing it wrong?)?
This seems like a good use-case for biplates. I'm relying on the slower Data.Data method, but it makes this code pretty trivial.
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Data
import Data.Typeable
import Data.Generics.Uniplate.Data
data Statement
= Return Expression
| If Expression Expression
deriving(Data, Typeable)
data Expression
= Literal Int
| Variable String
| Binary Expression Int Expression
deriving(Data, Typeable)
vars :: Statement -> [String]
vars stmt = [ s | Variable s <- universeBi stmt]
Basically biplates are a generalized notion of uniplates where the target type isn't necessarily the same as the source, eg
biplate :: from -> (Str to, Str to -> from)

How to include code in different places during compilations in Haskell?

Quasi-quotes allow generating AST code during compilations, but it inserts generated code at the place where Quasi-quote was written. Is it possible in any way to insert the compile-time generated code elsewhere? For example in specific module files which are different from the one where QQ was written? It would depend on hard-coded module structure, but that's fine.
If that's not possible with QQ but anyone knows a different way of achieving it, I am open for suggestions.
To answer this, it's helpful to know what a quasi-quoter is. From the GHC Documentation, a quasi-quoter is a value of
data QuasiQuoter = QuasiQuoter { quoteExp :: String -> Q Exp,
quotePat :: String -> Q Pat,
quoteType :: String -> Q Type,
quoteDec :: String -> Q [Dec] }
That is, it's a parser from an arbitrary String to one or more of ExpQ, PatQ, TypeQ, and DecQ, which are Template Haskell representations of expressions, patterns, types, and declarations respectively.
When you use a quasi-quote, GHC applies the parser to the String to create a ExpQ (or other type), then splices in the resulting template haskell expression to produce an actual value.
It sounds like what you're asking to do is separate the quasiquote parsing and splicing, so that you have access to the TH expression. Then you can import that expression into another module and splice it there yourself.
Knowing the type of a quasi-quoter, it's readily apparent this is possible. Normally you use a QQ as
-- file Expr.hs
eval :: Expr -> Integer
expr = QuasiQuoter { quoteExp = parseExprExp, quotePat = parseExprPat }
-- file Foo.hs
import Expr
myInt = eval [expr|1 + 2|]
Instead, you can extract the parser yourself, get a TH expression, and splice it later:
-- file Foo.hs
import Expr
-- run the QQ parser
myInt_TH :: ExpQ
myInt_TH = quoteExp expr "1 + 2"
-- file Bar.hs
import Foo.hs
-- run the TH splice
myInt = $(myInt_TH)
Of course if you're writing all this yourself, you can skip the quasi-quotes and use a parser and Template Haskell directly. It's pretty much the same thing either way.

Resources