How to pattern match an abstract data type when the data constructor isn't in scope? - haskell

I'm writing a parser library using Parsec combinators, and I want to unit test some of my parsers. So I have a simple parser:
dash :: GenParser Char st Char
dash = char '-'
I'd like to write some tests for it. The positive test is pretty easy:
spec :: Spec
spec = do
describe "dash" $ do
it "parses a dash" $
parse dash "N/A" "-" `shouldBe` (Right '-')
I'd like to write a negative test as well. When the parser doesn't match, it returns Left of a ParseError. I'd like to write a test that validates the exact message that the ParseError contains. So what I'd really like to do is something like
spec :: Spec
spec = do
describe "dash" $ do
it "doesn't parse an underscore" $
parse dash "N/A" "_" `shouldSatisfy` (hasErrorMessage "not a dash")
hasErrorMessage (Left (ParseError _ msgs)) expected = msg == expected
hasErrorMessage _ expected = False
But I'm having trouble writing this sort of code, since the ParseError data constructor isn't exported from Text.Parsec.Error.
Is there any way to use pattern matching on types where no data constructor for the type is in scope?
I know I could write hasErrorMessage something like
hasErrorMessage :: String -> (Either ParseError a) -> Bool
hasErrorMessage expected (Left pe) = elem expected $ fmap messageString (errorMessages pe)
but I'd like to understand this nuance, too.

Although the data constructor isn't exported, functions to access its parameters are. You can use these in combination with view patterns to sort of get what you want. In your case, the pattern (errorMessages -> msgs) can stand in almost perfectly for (ParseError _ msgs), with two caveats:
You need {-# LANGUAGE ViewPatterns #-} to use this feature.
errorMessages sorts the messages, which a pattern match on the data constructor wouldn't do.
You can even use this technique with pattern synonyms to make a fake data constructor, so you can use the exact same syntax you would otherwise:
{-# LANGUAGE PatternSynonyms, ViewPatterns #-}
pattern ParseError pos msgs <- ((,) <$> errorPos <*> errorMessages -> (pos, msgs)) where
ParseError pos msgs = foldr addErrorMessage (newErrorUnknown pos) msgs

Related

Names with single quotes in Template Haskell

Normally, when using Template Haskell, names of bindings and data constructors are quoted by prefixing them with a single quote:
showName, justName :: Name
showName = 'show
justName = 'Just
Unfortunately, this doesn't work for names whose second character is a single quote, because two single quotes with a single character between them are interpreted as a character literal. How can I work around this?
Edit: It seems the user's guide is wrong about there not being an escape mechanism! You can just add a space after the initial single quote. So... don't use the below hack.
It's possible to work around this limitation using expression quoting and a bogus Quote instance.
{-# LANGUAGE DerivingVia #-}
module ExtractName (extractName) where
import Data.Functor.Identity
import GHC.Stack
import Language.Haskell.TH.Syntax
extractName :: HasCallStack => Id Exp -> Name
extractName m = case unId m of
VarE x -> x
ConE x -> x
_ -> withFrozenCallStack $ error extractNameError
newtype Id a = Id {unId :: a}
deriving (Functor, Applicative, Monad) via Identity
-- This is bogus, but good enough for what we're doing.
instance Quote Id where
newName _ = withFrozenCallStack $ error extractNameError
extractNameError :: String
extractNameError =
"extractName: the argument must be an expression quote containing a\n"
++ "single bare name, such as [| f'1 |]"
Now you can write, for example,
f' :: Int
data Foo = B'ar
f'Name, b'arName :: Name
f'Name = extractName [| f' |]
b'arName = extractName [| B'ar |]
How does this work? An expression quote will produce an Exp in an arbitrary monad implementing Quote. In general, desugaring expression quotes may require the newName method, to desugar things like lets and lambda expressions. However, it does not need newName to desugar a plain old binding or data constructor. So we can write a bogus Quote implementation for an Identity-like type that will work for the sorts of quoted expressions we need. Once we've unwrapped the expression, we can extract the name from it.

Haskell: Matching String Prefixes against List

I've been learning some Haskell lately, and I thought a lexer might be a fun project. I'm using this ANSI C Yacc grammar as a guide.
The general program structure is:
lex :: [Char] -> Maybe [Token]
lex s =
case tokenize([], s) of
Just (tokens, []) -> Just tokens
_ -> Nothing
tokenize :: ([Token], [Char]) -> Maybe ([Token], [Char])
Where tokenize builds a list of tokens. I'm having trouble thinking of a suitable structure for tokenize. For example, to match keywords like int, I could write:
tokenize (toks, 'i':'n':'t':' ':rest) = tokenize (toks++[TokenKeyword IntK], rest)
But this seems like a terrible way to do things. Is there a way to pattern match against elements in a list? Could I create a list of all keywords, and attempt to match them as prefixes of the input string?
If you want to match based on a string prefix, you could use the ViewPatterns extension. This extension can be enabled by passing -XViewPatterns to the compiler, by running :set -XViewPatterns in ghci, or by putting {-# LANGUAGE ViewPatterns #-} at the top of the file.
Then, you can write a function matchPrefix (not 100% optimal, as it does iterate over prefix twice):
matchPrefix :: String -> String -> Maybe String
matchPrefix prefix result
| and (zipWith (==) prefix result) = Just (drop (length prefix) result)
| otherwise = Nothing
And then use it in a pattern like the following:
startsWithInt :: String -> Bool
startsWithInt (matchPrefix "int " -> Just rest) = True
startsWithInt _ = False
If you wanted to match based on a list of tokens, and get out the rest of the string and which token matched, you could do that by modifying matchPrefix to do that instead.

Converting Paulson's parser combinators to Haskell

I am trying to convert the code from Paulson's ML for the working programmer book chapter 9, Writing Interpreters for the λ-Calculus.
I was wondering if anyone can help me translate this to Haskell.
I'm struggling to understand the syntax.
fun list ph = ph -- repeat ("," $-- ph) >> (op::);
fun pack ph = "(" $-- list ph --$")" >> #1
| empty;
In porting this code to Haskell, I see two challenges: One is rewriting the combinators so they use the type Either SyntaxError rather than exceptions for flow control, and the other is preserving the modularity of ML's functors. That is, writing a parser combinator library that is modular with regards to what keywords / symbols / tokenizer it should use.
While the ML code has the two
functor Lexical (Keyword: KEYWORD) : LEXICAL
functor Parsing (Lex: LEXICAL) : PARSE
you could start by having
data Keyword = Keyword
{ alphas :: [String]
, symbols :: [String]
}
data Token
= Key String
| Id String
deriving (Show, Eq)
lex :: Keyword -> String -> [Token]
lex kw s = ...
where
alphaTok :: String -> Token
alphaTok a | a `elem` alphas kw = Key a
| otherwise = Id a
...
The ML code uses the types string and substring while Haskell's String is actually a [Char]. The lexer functions would look a little different because ML's String.getc could simply be the pattern match c : ss1 in Haskell, etc.
Paulson's parsers have type [Token] → (τ, [Token]) but allow for exceptions. The Haskell parsers could have type [Token] → Either SyntaxError (τ, [Token]):
newtype SyntaxError = SyntaxError String
deriving Show
newtype Parser a = Parser { runParser :: [Token] -> Either SyntaxError (a, [Token]) }
err :: String -> Either SyntaxError b
err msg = Left (SyntaxError msg)
The operators id, $, ||, !!, -- and >> need new names, since they collide with a bunch of built-in operators and single-line comments. Ideas for names could be: ident, kw, |||, +++ and >>>. I would skip implementing the !! operator initially.
Here are two combinators implemented a little differently,
ident :: Parser String
ident = Parser f
where
f :: [Token] -> Either SyntaxError (String, [Token])
f (Id x : toks) = Right (x, toks)
f (Key x : _) = err $ "Identifier expected, got keyword '" ++ x ++ "'"
f [] = err "Identifier expected, got EOF"
(+++) :: Parser a -> Parser b -> Parser (a, b)
(+++) pa pb = Parser $ \toks1 -> do (x, toks2) <- runP pa toks1
(y, toks3) <- runP pb toks2
return ((x, y), toks3)
...
Some final remarks:
Read the paper Monadic Parsing in Haskell (Hutton, Meijer).
You may be interested in SimpleParse by Ken Friis Larsen, an educational parser combinator library that is a simplification of ReadP by Koen Claessen, since its source code is very easy to read. They are both non-deterministic.
If you're interested in using parser combinators in Haskell, rather than porting some old-fashioned library for the learning experience, I encourage you too look at Megaparsec (tutorial), a modern fork of Parsec. The implementation is a little complex.
None of these three libraries (SimpleParse, ReadP, Megaparsec) split lexing and parsing into two separate steps. Rather, they simply build small tokenizing parsers that implicitly eat meaningless whitespace. (See the token combinator in SimpleParse, for example.) However, Megaparsec does allow an arbitrary token type, whether that is Char or some token you have lexed.

How to return a polymorphic type in Haskell based on the results of string parsing?

TL;DR:
How can I write a function which is polymorphic in its return type? I'm working on an exercise where the task is to write a function which is capable of analyzing a String and, depending on its contents, generate either a Vector [Int], Vector [Char] or Vector [String].
Longer version:
Here are a few examples of how the intended function would behave:
The string "1 2\n3 4" would generate a Vector [Int] that's made up of two lists: [1,2] and [3,4].
The string "'t' 'i' 'c'\n't' 'a' 'c'\n't' 'o' 'e'" would generate a Vector [Char] (i.e., made up of the lists "tic", "tac" and "toe").
The string "\"hello\" \"world\"\n\"monad\" \"party\"" would generate a Vector [String] (i.e., ["hello","world"] and ["monad","party"]).
Error-checking/exception handling is not a concern for this particular exercise. At this stage, all testing is done purely, i.e., this isn't in the realm of the IO monad.
What I have so far:
I have a function (and new datatype) which is capable of classifying a string. I also have functions (one for each Int, Char and String) which can convert the string into the necessary Vector.
My question: how can I combine these three conversion functions into a single function?
What I've tried:
(It obviously doesn't typecheck if I stuff the three conversion
functions into a single function (i.e., using a case..of structure
to pattern match on VectorType of the string.
I tried making a Vectorable class and defining a separate instance for each type; I quickly realized that this approach only works if the functions' arguments vary by type. In our case, the the type of the argument doesn't vary (i.e., it's always a String).
My code:
A few comments
Parsing: the mySplitter object and the mySplit function handle the parsing. It's admittedly a crude parser based on the Splitter type and the split function from Data.List.Split.Internals.
Classifying: The classify function is capable of determining the final VectorType based on the string.
Converting: The toVectorNumber, toVectorChar and toVectorString functions are able to convert a string to type Vector [Int], Vector [Char] and Vector [String], respectively.
As a side note, I'm trying out CorePrelude based on a recommendation from a mentor. That's why you'll see me use the generalized versions of the normal Prelude functions.
Code:
import qualified Prelude
import CorePrelude
import Data.Foldable (concat, elem, any)
import Control.Monad (mfilter)
import Text.Read (read)
import Data.Char (isAlpha, isSpace)
import Data.List.Split (split)
import Data.List.Split.Internals (Splitter(..), DelimPolicy(..), CondensePolicy(..), EndPolicy(..), Delimiter(..))
import Data.Vector ()
import qualified Data.Vector as V
data VectorType = Number | Character | TextString deriving (Show)
mySplitter :: [Char] -> Splitter Char
mySplitter elts = Splitter { delimiter = Delimiter [(`elem` elts)]
, delimPolicy = Drop
, condensePolicy = Condense
, initBlankPolicy = DropBlank
, finalBlankPolicy = DropBlank }
mySplit :: [Char]-> [Char]-> [[Char]]
mySplit delims = split (mySplitter delims)
classify :: String -> VectorType
classify xs
| '\"' `elem` cs = TextString
| hasAlpha cs = Character
| otherwise = Number
where
cs = concat $ split (mySplitter "\n") xs
hasAlpha = any isAlpha . mfilter (/=' ')
toRows :: [Char] -> [[Char]]
toRows = mySplit "\n"
toVectorChar :: [Char] -> Vector [Char]
toVectorChar = let toChar = concat . mySplit " \'"
in V.fromList . fmap (toChar) . toRows
toVectorNumber :: [Char] -> Vector [Int]
toVectorNumber = let toNumber = fmap (\x -> read x :: Int) . mySplit " "
in V.fromList . fmap toNumber . toRows
toVectorString :: [Char] -> Vector [[Char]]
toVectorString = let toString = mfilter (/= " ") . mySplit "\""
in V.fromList . fmap toString . toRows
You can't.
Covariant polymorphism is not supported in Haskell, and wouldn't be useful if it were.
That's basically all there is to answer. Now as to why this is so.
It's no good "returning a polymorphic value" like OO languages so like to do, because the only reason to return any value at all is to use it in other functions. Now, in OO languages you don't have functions but methods that come with the object, so it's quite easy to "return different types": each will have its suitable methods built-in, and they can per instance vary. (Whether that's a good idea is another question.)
But in Haskell, the functions come from elsewhere. They don't know about implementation changes for a particular instance, so the only way such functions can safely be defined is to know every possible implementation. But if your return type is really polymorphic, that's not possible, because polymorphism is an "open" concept (it allows new implementation varieties to be added any time later).
Instead, Haskell has a very convenient and totally safe mechanism of describing a closed set of "instances" – you've actually used it yourself already! ADTs.
data PolyVector = NumbersVector (Vector [Int])
| CharsVector (Vector [Char])
| StringsVector (Vector [String])
That's the return type you want. The function won't be polymorphic as such, it'll simply return a more versatile type.
If you insist it should be polymorphic
Now... actually, Haskell does have a way to sort-of deal with "polymorphic returns". As in OO when you declare that you return a subclass of a specified class. Well, you can't "return a class" at all in Haskell, you can only return types. But those can be made to express "any instance of...". It's called existential quantification.
{-# LANGUAGE GADTs #-}
data PolyVector' where
PolyVector :: YourVElemClass e => Vector [e] -> PolyVector'
class YourVElemClass where
...?
instance YourVElemClass Int
instance YourVElemClass Char
instance YourVElemClass String
I don't know if that looks intriguing to you. Truth is, it's much more complicated and rather harder to use; you can't just just any of the possible results directly but can only make use of the elements through methods of YourVElemClass. GADTs can in some applications be extremely useful, but these usually involve classes with very deep mathematical motivation. YourVElemClass doesn't seem to have such a motivation, so you'll be much better off with a simple ADT alternative, than existential quantification.
There's a famous rant against existentials by Luke Palmer (note he uses another syntax, existential-specific, which I consider obsolete, as GADTs are strictly more general).
Easy, use an sum type!
data ParsedVector = NumberVector (Vector [Int]) | CharacterVector (Vector [Char]) | TextString (Vector [String]) deriving (Show)
parse :: [Char] -> ParsedVector
parse cs = case classify cs of
Number -> NumberVector $ toVectorNumber cs
Character -> CharacterVector $ toVectorChar cs
TextString -> TextStringVector $ toVectorString cs

Parsec returns [Char] instead of Text

I am trying to create a parser for a custom file format. In the format I am working with, some fields have a closing tag like so:
<SOL>
<DATE>0517
<YEAR>86
</SOL>
I am trying to grab the value between the </ and > and use it as part of the bigger parser.
I have come up with the code below. The trouble is, the parser returns [Char] instead of Text. I can pack each Char by doing fmap pack $ return r to get a text value out, but I was hoping type inference would save me from having to do this. Could someone give hints as to why I am getting back [Char] instead of Text, and how I can get back Text without having to manually pack the value?
{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.Text
import Text.Parsec
import Text.Parsec.Text
-- |A closing tag is on its own line and is a "</" followed by some uppercase characters
-- followed by some '>'
closingTag = do
_ <- char '\n'
r <- between (string "</") (char '>') (many upper)
return r
string has the type
string :: Stream s m Char => String -> ParsecT s u m String
(See here for documentation)
So getting a String back is exactly what's supposed to happen.
Type inference doesn't change types, it only infers them. String is a concrete type, so there's no way to infer Text for it.
What you could do, if you need this in a couple of places, is to write a function
text :: Stream s m Char => String -> ParsecT s u m Text
text = fmap pack . string
or even
string' :: (IsString a, Stream s m Char) => String -> ParsecT s u m a
string' = fmap fromString . string
Also, it doesn't matter in this example but you'd probably want to import Text qualified, names like pack are used in a number of different modules.
As Ørjan Johansen correctly pointed out, string isn't actually the problem here, many upper is. The same principle applies though.
The reason you get [Char] here is that upper parses a Char and many turns that into a [Char]. I would write my own combinator along the lines of:
manyPacked = fmap pack . many
You could probably use type-level programming with type classes etc. to automatically choose between many and manyPack depending on the expect return type, but I don't think that's worth it. (It would probably look a bit like Scala's CanBuiltFrom).

Resources