How can I write COMPLETE pragmas for types with many constructors? - haskell

Suppose I have a type with many constructors and a few pattern synonyms. I'd like to use pattern synonyms to replace a few of the constructors. How can I write the necessary COMPLETE pragma(s) without having to write out all the constructors by hand and risk falling behind if more are added?

Using the th-abstraction package, this is quite simple. Some throat clearing:
import Language.Haskell.TH.Datatype (DatatypeInfo (..), ConstructorInfo(..), reifyDatatype)
import Language.Haskell.TH (Q, Dec, Name, pragCompleteD)
import Data.List ((\\))
Using reifyDatatype, we can get info about a type, and extract a list of the names of its constructors. Then we simply need to add on the patterns we want and remove the constructors we don't want.
-- | Produce a #COMPLETE# pragma for a type with many constructors,
-- without having to list them all out.
--
-- #completeWithButWithout ''T ['P] ['C1, 'C2]# produces a #COMPLETE#
-- pragma stating that pattern matching on the type #T# is complete with
-- with the pattern #P# and with all the constructors of #T# other than
-- #C1# and #C2#.
completeWithButWithout :: Name -> [Name] -> [Name] -> Q [Dec]
completeWithButWithout ty extra_patterns excl_constrs = do
di <- reifyDatatype ty
let constrs = map constructorName (datatypeCons di)
(:[]) <$> pragCompleteD (extra_patterns ++ (constrs \\ excl_constrs))
(Just ty)
Now the module defining the datatype just needs to import this one, and say
data Foo = Bar' Int | Baz | Quux | ...
pattern Bar :: Char -> Foo
$(completeWithButWithout ''Foo ['Bar] ['Bar'])
I recommend invoking completeWithButWithout at the very end of the module, to prevent the splice from splitting the module.

Related

Type design for the AST of my language remembering token locations

I wrote a parser and evaluator for a simple programming language. Here is a simplified version of the types for the AST:
data Value = IntV Int | FloatV Float | BoolV Bool
data Expr = IfE Value [Expr] | VarDefE String Value
type Program = [Expr]
I want error messages to tell the line and column of the source code in which the error occured. For example, if the value in an If expression is not a boolean, I want the evaluator to show an error saying "expected boolean at line x, column y", with x and y referring to the location of the value.
So, what I need to do is redefine the previous types so that they can store the relevant locations of different things. One option would be to add a location to each constructor for expressions, like so:
type Location = (Int, Int)
data Expr = IfE Value [Expr] Location | VarDef String Value Location
This clearly isn't optimal, because I have to add those Location fields to every possible expression, and if for example a value contained other values, I would need to add locations to that value too:
{-
this would turn into FunctionCall String [Value] [Location],
with one location for each value in the function call
-}
data Value = ... | FunctionCall String [Value]
I came up with another solution, which allows me to add locations to everything:
data Located a = Located Location a
type LocatedExpr = Located Expr
type LocatedValue = Located Value
data Value = IntV Int | FloatV Float | BoolV Bool | FunctionCall String [LocatedValue]
data Expr = IfE LocatedValue [LocatedExpr] | VarDef String LocatedValue
data Program = [LocatedExpr]
However I don't like this that much. First of all, it clutters the definition of the evaluator and pattern matching has an extra layer every time. Also, I don't think saying that a function call takes located values as arguments is quite right. Function calls should take values as arguments, and locations should be metadata that doesn't interfere with the evaluator.
I need help redefining my types so that the solution is as clean as possible. Maybe there is a language extension or a design pattern I don't know about that could be helpful.
There are many ways to annotate an AST! This is half of what’s known as the AST typing problem, the other half being how you manage an AST that changes over the course of compilation. The problem isn’t exactly “solved”: all of the solutions have tradeoffs, and which one to pick depends on your expected use cases. I’ll go over a few that you might like to investigate at the end.
Whichever method you choose for organising the actual data types, if it makes pattern-matching ugly or unwieldy, the natural solution is PatternSynonyms.
Considering your first example:
{-# Language PatternSynonyms #-}
type Location = (Int, Int)
data Expr
= LocatedIf Value [Expr] Location
| LocatedVarDef String Value Location
-- Unidirectional pattern synonyms which ignore the location:
pattern If :: Value -> [Expr] -> Expr
pattern If val exprs <- LocatedIf val exprs _loc
pattern VarDef :: String -> Value -> Expr
pattern VarDef name expr <- LocatedVarDef name expr _loc
-- Inform GHC that matching ‘If’ and ‘VarDef’ is just as good
-- as matching ‘LocatedIf’ and ‘LocatedVarDef’.
{-# Complete If, VarDef #-}
This may be sufficiently tidy for your purposes already. But here are a few more tips that I find helpful.
Put annotations first: when adding an annotation type to an AST directly, I often prefer to place it as the first parameter of each constructor, so that it can be conveniently partially applied.
data LocatedExpr
= LocatedIf Location Value [Expr]
| LocatedVarDef Location String Value
If the annotation is a location, then this also makes it more convenient to obtain when writing certain kinds of parsers, along the lines of AnnotatedIf <$> (getSourceLocation <* ifKeyword) <*> value <*> many expr in a parser combinator library.
Parameterise your annotations: I often make the annotation type into a type parameter, so that GHC can derive some useful classes for me:
{-# Language
DeriveFoldable,
DeriveFunctor,
DeriveTraversable #-}
data AnnotatedExpr a
= AnnotatedIf a Value [Expr]
| AnnotatedVarDef a String Value
deriving (Functor, Foldable, Traversable)
type LocatedExpr = AnnotatedExpr Location
-- Get the annotation of an expression.
-- (Total as long as every constructor is annotated.)
exprAnnotation :: AnnotatedExpr a -> a
exprAnnotation = head
-- Update annotations purely.
mapAnnotations
:: (a -> b)
-> AnnotatedExpr a -> AnnotatedExpr b
mapAnnotations = fmap
-- traverse, foldMap, &c.
If you want “doesn’t interfere”, use polymorphism: you can enforce that the evaluator can’t inspect the annotation type by being polymorphic over it. Pattern synonyms still let you match on these expressions conveniently:
pattern If :: Value -> [AnnotatedExpr a] -> AnnotatedExpr a
pattern If val exprs <- AnnotatedIf _anno val exprs
-- …
eval :: AnnotatedExpr a -> Value
eval expr = case expr of
If val exprs -> -- …
VarDef name expr -> -- …
Unannotated terms aren’t your enemy: a term without source locations is no good for error reporting, but I think it’s still a good idea to make the pattern synonyms bidirectional for the convenience of constructing unannotated terms with a unit () annotation. (Or something equivalent, if you use e.g. Maybe Location as the annotation type.)
The reason is that this is quite convenient for writing unit tests, where you want to check the output, but want to use Eq instead of pattern matching, and don’t want to have to compare all the source locations in tests that aren’t concerned with them. Using the derived classes, void :: (Functor f) => f a -> f () strips out all the annotations on an AST.
import Control.Monad (void)
type BareExpr = AnnotatedExpr ()
-- One way to define bidirectional synonyms, so e.g.
-- ‘If’ can be used as either a pattern or a constructor.
pattern If :: Value -> [BareExpr] -> BareExpr
pattern If val exprs = AnnotatedIf () val exprs
-- …
stripAnnotations :: AnnotatedExpr a -> BareExpr
stripAnnotations = void
Equivalently, you could use GADTs / ExistentialQuantification to say data AnyExpr where { AnyExpr :: AnnotatedExpr a -> AnyExpr } / data AnyExpr = forall a. AnyExpr (AnnotatedExpr a); that way, the annotations have exactly as much information as (), but you don’t need to fmap over the entire tree with void in order to strip it, just apply the AnyExpr constructor to hide the type.
Finally, here are some brief introductions to a few AST typing solutions.
Annotate each AST node with a tag (e.g. a unique ID), then store all metadata like source locations, types, and whatever else, separately from the AST:
import Data.IntMap (IntMap)
-- More sophisticated/stronglier-typed tags are possible.
newtype Tag = Tag Int
newtype TagMap a = TagMap (IntMap a)
data Expr
= If !Tag Value [Expr]
| VarDef !Tag String Expr
type Span = (Location, Location)
type SourceMap = TagMap Span
type CommentMap = TagMap (Span, String)
parse
:: String -- Input
-> Either ParseError
( Expr -- Parsed expression
, SourceMap -- Source locations of tags
, CommentMap -- Sideband for comments
-- …
)
The advantage is that you can very easily mix in arbitrary new types of annotations anywhere, without affecting the AST itself, and avoid rewriting the AST just to change annotations. You can think of the tree and annotation tables as a kind of database, where the tags are the “foreign keys” relating them. A downside is that you must be careful to maintain these tags when you do rewrite the AST.
I don’t know if this approach has an established name; I think of it as just “tagging” or a “tagged AST”.
recursion-schemes and/or Data Types à la CartePDF: separate out the “recursive” part of an annotated expression tree from the “annotation” part, and use Fix to tie them back together, with Compose (or Cofree) to add annotations in the middle.
data ExprF e
= IfF Value [e]
| VarDefF String e
-- …
deriving (Foldable, Functor, Traversable, …)
-- Unannotated: Expr ~ ExprF (ExprF (ExprF (…)))
type Expr = Fix ExprF
-- With a location at each recursive step:
--
-- LocatedExpr ~ Located (ExprF (Located (ExprF (…))))
type LocatedExpr = Fix (Compose Located ExprF)
data Located a = Located Location a
deriving (Foldable, Functor, Traversable, …)
-- or: type Located = (,) Location
A distinct advantage is that you get a bunch of nice traversal stuff like cata for free-ish, so you can avoid having to write manual traversals over your AST over and over. A downside is that it adds some pattern clutter to clean up, as does the “à la carte” approach, but they do offer a lot of flexibility.
Trees That GrowPDF is overkill for just source locations, but in a serious compiler it’s quite helpful. If you expect to have more than one annotation type (such as inferred types or other analysis results) or an AST that changes over time, then you add a type parameter for the “compilation phase” (parsed, renamed, typechecked, desugared, &c.) and select field types or enable & disable constructors based on that index.
A really unfortunate downside of this is that you often have to rewrite the tree even in places nothing has changed, because everything depends on the “phase”. An alternative that I use is to add one type parameter for each type of phase or annotation that can vary independently, e.g. data Expr annotation termVarName typeVarName, and abstract over that with type and pattern synonyms. This lets you update indices independently and still use classes like Functor and Bitraversable.

Can I cast view patterns on the data inside a data constructor?

I have a convoluted mess of data constructors in my code that I wanted to straighten somewhat with the help of that new Pattern Synonyms language extension. However, it appears to me that I can't do it without a specialized view function that strips the specific data constructor at hand, only to apply some usual library functions to the data inside. I would expect that I could pattern match against the constructor and then apply that usual library function as a view, getting rid of the need to explicitly define the specialized view function. Alas, I couldn't figure out how to do that.
The less obvious downside of writing a specialized view function is that, if it fails pattern matching, there will be an error that mentions the view function's name, thus foiling the abstraction of the pattern synonym. So I can't just have a pattern matching failure in my view function, but I rather have to explicitly return a Maybe value so that I can then fail to match a Just in the pattern itself. This is exactly boilerplate.
I will appreciate any comments on how best to arrange the code in such circumstances, including whether it's actually a good practice to put pattern synonyms to use this way.
This is the example code I put together specifically for this question, very similar to that which I actually have to write. I tried my best to be concise. I also put comments and examples everywhere -- I hope it helps the reader more than it annoys. You can also run the code in ghci at once.
{-# LANGUAGE
PatternSynonyms
, ViewPatterns
#-}
module PatternSynonym where
-- ### Type definitions.
-- | This wrapper may have been introduced to harden type safety, or define a
-- typeclass instance. Its actual purpose does not matter for us here.
newtype Wrapped a = Wrap { unWrap :: a } deriving (Eq, Show)
-- | This data type exemplifies a non-trivial collection of things.
data MaybeThings a = Some [a] | None deriving (Eq, Show)
-- | This type synonym exemplifies a non-trivial collection of things that are
-- additionally wrapped.
type MaybeWrappedThings a = MaybeThings (Wrapped a)
-- ### An example of what we may want to do with our types:
-- | This is a function that does useful work on plain (not Wrapped!) Ints.
doSomething :: [Int] -> [Int]
doSomething = filter even
-- ^
-- λ doSomething [2, 3, 5]
-- [2]
-- | This is the example data we may have.
example :: MaybeWrappedThings Int
example = Some [Wrap 2, Wrap 3, Wrap 5]
-- | This is a function that must only accept a collection of wrapped things,
-- but it has to unwrap them in order to do something useful.
getThings :: MaybeWrappedThings Int -> [Int]
getThings (Some wxs) = doSomething xs
where xs = unWrap <$> wxs
getThings None = [ ]
-- ^
-- λ getThings example
-- [2]
-- ### An example of how we may simplify the same logic with the help of
-- pattern synonyms.
-- | This helper function is necessary (?) in order for it to be possible to
-- define the pattern synonym.
unWrapAll :: MaybeWrappedThings a -> Maybe [a]
unWrapAll (Some wxs) = Just (unWrap <$> wxs)
unWrapAll None = Nothing
-- | This is the pattern synonym that allows us to somewhat simplify getThings.
pattern Some' :: [a] -> MaybeWrappedThings a
pattern Some' xs <- (unWrapAll -> Just xs)
where Some' = Some . fmap Wrap
-- | This is how we may define our data with the pattern synonym.
-- It's got linearly less lexemes!
example' :: MaybeWrappedThings Int
example' = Some' [2, 3, 5]
-- ^
-- λ example == example'
-- True
-- | This is how our function may look with the pattern synonym.
getThings' :: MaybeWrappedThings Int -> [Int]
getThings' (Some' xs) = doSomething xs
getThings' None = [ ]
-- ^
-- λ getThings' example'
-- [2]
--
-- λ getThings' example' == getThings example
-- True
Is there anything Haskell cannot do?
I think this:
pattern Some' xs <- Some (coerce -> xs)
-- Is perfection.
Thanks to #chi for the coerce thing and to #li-yao-xia for the code that views the variable after matching with Some data constructor.
It's very intuitive, I just didn't even suspect Haskell can do this much.

What does (..) mean?

I'm trying to learn Haskell.
I'm reading the code on here[1]. I just copy and past some part of the code from lines:46 and 298-300.
Question: What does (..) mean?
I Hoggled it but I got no result.
module Pos.Core.Types(
-- something here
SharedSeed (..) -- what does this mean?
) where
newtype SharedSeed = SharedSeed
{ getSharedSeed :: ByteString
} deriving (Show, Eq, Ord, Generic, NFData, Typeable)
[1] https://github.com/input-output-hk/cardano-sl/blob/master/core/Pos/Core/Types.hs
The syntax of import/export lists has not much to do with the syntax of Haskell itself. It's just a comma-separated listing of everything you want to export from your module. Now, there's a problem there because Haskell really has two languages with symbols that may have the same name. This is particularly common with newtypes like the one in your example: you have a type-level name SharedSeed :: *, and also a value-level name (data constructor) SharedSeed :: ByteString -> SharedSeed.
This only happens with uppercase names, because lowercase at type level are always local type variables. Thus the convention in export lists that uppercase names refer to types.
But just exporting the type does not allow users to actually construct values of that type. That's often prudent: not all internal-representation values might make legal values of the newtype (see Bartek's example), so then it's better to only export a safe smart constructor instead of the unsafe data constructor.
But other times, you do want to make the data constructor available, certainly for multi-constructor types like Maybe. To that end, export lists have three syntaxes you can use:
module Pos.Core.Types(
SharedSeed (SharedSeed) will export the constructor SharedSeed. In this case that's of course the only constructor anyway, but if there were other constructors they would not be exported with this syntax.
SharedSeed (..) will export all constructors. Again, in this case that's only SharedSeed, but for e.g. Maybe it would export both Nothing and Just.
pattern SharedSeed will export ShareSeed as a standalone pattern (independent of the export of the ShareSeed type). This requires the -XPatternSynonyms extension.
)
It means "export all constructors and record fields for this data type".
When writing a module export list, there's three4 ways you can export a data type:
module ModuleD(
D, -- just the type, no constructors
D(..), -- the type and all its constructors
D(DA) -- the type and the specific constructor
) where
data D = DA A | DB B
If you don't export any constructors, the type, well, can't be constructed, at least directly. This is useful if you e.g. want to enforce some runtime invariants on the data type:
module Even (evenInt, toInt) where
newtype EvenInt = EvenInt Int deriving (Show, Eq)
evenInt :: Int -> Maybe EvenInt
evenInt x = if x `mod` 2 == 0 then Just x else Nothing
toInt :: EvenInt -> Int
toInt (EvenInt x) = x
The caller code can now use this type, but only in the allowed manner:
x = evenInt 2
putStrLn $ if isJust x then show . toInt . fromJust $ x else "Not even!"
As a side note, toInt is usually implemented indirectly via the record syntax for convenience:
data EvenInt = EvenInt { toInt :: Int }
4 See #leftaroundabout's answer

What's a better way of managing large Haskell records?

Replacing fields names with letters, I have cases like this:
data Foo = Foo { a :: Maybe ...
, b :: [...]
, c :: Maybe ...
, ... for a lot more fields ...
} deriving (Show, Eq, Ord)
instance Writer Foo where
write x = maybeWrite a ++
listWrite b ++
maybeWrite c ++
... for a lot more fields ...
parser = permute (Foo
<$?> (Nothing, Just `liftM` aParser)
<|?> ([], bParser)
<|?> (Nothing, Just `liftM` cParser)
... for a lot more fields ...
-- this is particularly hideous
foldl1 merge [foo1, foo2, ...]
merge (Foo a b c ...seriously a lot more...)
(Foo a' b' c' ...) =
Foo (max a a') (b ++ b') (max c c') ...
What techniques would allow me to better manage this growth?
In a perfect world a, b, and c would all be the same type so I could keep them in a list, but they can be many different types. I'm particularly interested in any way to fold the records without needing the massive patterns.
I'm using this large record to hold the different types resulting from permutation parsing the vCard format.
Update
I've implemented both the generics and the foldl approaches suggested below. They both work, and they both reduce three large field lists to one.
Datatype-generic programming techniques can be used to transform all the fields of a record in some "uniform" sort of way.
Perhaps all the fields in the record implement some typeclass that we want to use (the typical example is Show). Or perhaps we have another record of "similar" shape that contains functions, and we want to apply each function to the corresponding field of the original record.
For these kinds of uses, the generics-sop library is a good option. It expands the default Generics functionality of GHC with extra type-level machinery that provides analogues of functions like sequence or ap, but which work over all the fields of a record.
Using generics-sop, I tried to create a slightly less verbose version of your merge funtion. Some preliminary imports:
{-# language TypeOperators #-}
{-# language DeriveGeneric #-}
{-# language TypeFamilies #-}
{-# language DataKinds #-}
import Control.Applicative (liftA2)
import qualified GHC.Generics as GHC
import Generics.SOP
A helper function that lifts a binary operation to a form useable by the functions of generics-sop:
fn_2' :: (a -> a -> a) -> (I -.-> (I -.-> I)) a -- I is simply an Identity functor
fn_2' = fn_2 . liftA2
A general merge function that takes a vector of operators and works on any single-constructor record that derives Generic:
merge :: (Generic a, Code a ~ '[ xs ]) => NP (I -.-> (I -.-> I)) xs -> a -> a -> a
merge funcs reg1 reg2 =
case (from reg1, from reg2) of
(SOP (Z np1), SOP (Z np2)) ->
let npResult = funcs `hap` np1 `hap` np2
in to (SOP (Z npResult))
Code is a type family that returns a type-level list of lists describing the structure of a datatype. The outer list is for constructors, the inner lists contain the types of the fields for each constructor.
The Code a ~ '[ xs ] part of the constraint says "the datatype can only have one constructor" by requiring the outer list to have exactly one element.
The (SOP (Z _) pattern matches extract the (heterogeneus) vector of field values from the record's generic representation. SOP stands for "sum-of-products".
A concrete example:
data Person = Person
{
name :: String
, age :: Int
} deriving (Show,GHC.Generic)
instance Generic Person -- this Generic is from generics-sop
mergePerson :: Person -> Person -> Person
mergePerson = merge (fn_2' (++) :* fn_2' (+) :* Nil)
The Nil and :* constructors are used to build the vector of operators (the type is called NP, from n-ary product). If the vector doesn't match the number of fields in the record, the program won't compile.
Update. Given that the types in your record are highly uniform, an alternative way of creating the vector of operations is to define instances of an auxiliary typeclass for each field type, and then use the hcpure function:
class Mergeable a where
mergeFunc :: a -> a -> a
instance Mergeable String where
mergeFunc = (++)
instance Mergeable Int where
mergeFunc = (+)
mergePerson :: Person -> Person -> Person
mergePerson = merge (hcpure (Proxy :: Proxy Mergeable) (fn_2' mergeFunc))
The hcliftA2 function (that combines hcpure, fn_2 and hap) could be used to simplify things further.
Some suggestions:
(1) You can use the RecordWildCards extension to automatically
unpack a record into variables. Doesn't help if you need to unpack
two records of the same type, but it's a useful to keep in mind.
Oliver Charles has a nice blog post on it: (link)
(2) It appears your example application is performing a fold over the records.
Have a look at Gabriel Gonzalez's foldl package. There is also a blog post: (link)
Here is a example of how you might use it with a record like:
data Foo = Foo { _a :: Int, _b :: String }
The following code computes the maximum of the _a fields and the
concatenation of the _b_ fields.
import qualified Control.Foldl as L
import Data.Profunctor
data Foo = Foo { _a :: Int, _b :: String }
deriving (Show)
fold_a :: L.Fold Foo Int
fold_a = lmap _a (L.Fold max 0 id)
fold_b :: L.Fold Foo String
fold_b = lmap _b (L.Fold (++) "" id)
fold_foos :: L.Fold Foo Foo
fold_foos = Foo <$> fold_a <*> fold_b
theFoos = [ Foo 1 "a", Foo 3 "b", Foo 2 "c" ]
test = L.fold fold_foos theFoos
Note the use of the Profunctor function lmap to extract out
the fields we want to fold over. The expression:
L.Fold max 0 id
is a fold over a list of Ints (or any Num instance), and therefore:
lmap _a (L.Fold max 0 id)
is the same fold but over a list of Foo records where we use _a
to produce the Ints.

How does newtype help hiding anything?

Real world haskell says:
we will hide the details of our parser type using a newtype
declaration
I don't get how we can hide anything using the newtype. Can anyone elaborate? What are we trying to hide and how do we do it.
data ParseState = ParseState {
string :: L.ByteString
, offset :: Int64 -- imported from Data.Int
} deriving (Show)
newtype Parse a = Parse {
runParse :: ParseState -> Either String (a, ParseState)
}
The idea is to combine modules + newtypes to keep people from seeing the internals of how we implement things.
-- module A
module A (A, toA) where -- Notice we limit our exports
newtype A = A {unA :: Int}
toA :: Int -> A
toA = -- Do clever validation
-- module B
import A
foo :: A
foo = toA 1 -- Must use toA and can't see internals of A
This prevents from pattern matching and arbitrarily constructing A. This let's our A module make certain assumptions about A and also change the internals of A with impunity!
This is particularly nice because at runtime the newtypes are erased so there's almost no overhead from doing something like this
You hide details by not exporting stuff. So there's two comparisons to make. One is exported vs. not exported:
-- hidden: nothing you can do with "Parse a" values -- though
-- you can name their type
module Foo (Parse) where
newtype Parse a = Parse { superSecret :: a }
-- not hidden: outsiders can observe that a "Parse a" contains
-- exactly an "a", so they can do anything with a "Parse a" that
-- they can do with an "a"
module Foo (Parse(..)) where
newtype Parse a = Parse { superSecret :: a }
The other is more subtle, and is the one RWH is probably trying to highlight, and that is type vs. newtype:
-- hidden, as before
module Foo (Parse) where
newtype Parse a = Parse { superSecret :: a }
-- not hidden: it is readily observable that "Parse a" is identical
-- to "a", and moreover it can't be fixed because there's nothing
-- to hide
module Foo (Parse) where
type Parse a = a

Resources