Type design for the AST of my language remembering token locations - haskell

I wrote a parser and evaluator for a simple programming language. Here is a simplified version of the types for the AST:
data Value = IntV Int | FloatV Float | BoolV Bool
data Expr = IfE Value [Expr] | VarDefE String Value
type Program = [Expr]
I want error messages to tell the line and column of the source code in which the error occured. For example, if the value in an If expression is not a boolean, I want the evaluator to show an error saying "expected boolean at line x, column y", with x and y referring to the location of the value.
So, what I need to do is redefine the previous types so that they can store the relevant locations of different things. One option would be to add a location to each constructor for expressions, like so:
type Location = (Int, Int)
data Expr = IfE Value [Expr] Location | VarDef String Value Location
This clearly isn't optimal, because I have to add those Location fields to every possible expression, and if for example a value contained other values, I would need to add locations to that value too:
{-
this would turn into FunctionCall String [Value] [Location],
with one location for each value in the function call
-}
data Value = ... | FunctionCall String [Value]
I came up with another solution, which allows me to add locations to everything:
data Located a = Located Location a
type LocatedExpr = Located Expr
type LocatedValue = Located Value
data Value = IntV Int | FloatV Float | BoolV Bool | FunctionCall String [LocatedValue]
data Expr = IfE LocatedValue [LocatedExpr] | VarDef String LocatedValue
data Program = [LocatedExpr]
However I don't like this that much. First of all, it clutters the definition of the evaluator and pattern matching has an extra layer every time. Also, I don't think saying that a function call takes located values as arguments is quite right. Function calls should take values as arguments, and locations should be metadata that doesn't interfere with the evaluator.
I need help redefining my types so that the solution is as clean as possible. Maybe there is a language extension or a design pattern I don't know about that could be helpful.

There are many ways to annotate an AST! This is half of what’s known as the AST typing problem, the other half being how you manage an AST that changes over the course of compilation. The problem isn’t exactly “solved”: all of the solutions have tradeoffs, and which one to pick depends on your expected use cases. I’ll go over a few that you might like to investigate at the end.
Whichever method you choose for organising the actual data types, if it makes pattern-matching ugly or unwieldy, the natural solution is PatternSynonyms.
Considering your first example:
{-# Language PatternSynonyms #-}
type Location = (Int, Int)
data Expr
= LocatedIf Value [Expr] Location
| LocatedVarDef String Value Location
-- Unidirectional pattern synonyms which ignore the location:
pattern If :: Value -> [Expr] -> Expr
pattern If val exprs <- LocatedIf val exprs _loc
pattern VarDef :: String -> Value -> Expr
pattern VarDef name expr <- LocatedVarDef name expr _loc
-- Inform GHC that matching ‘If’ and ‘VarDef’ is just as good
-- as matching ‘LocatedIf’ and ‘LocatedVarDef’.
{-# Complete If, VarDef #-}
This may be sufficiently tidy for your purposes already. But here are a few more tips that I find helpful.
Put annotations first: when adding an annotation type to an AST directly, I often prefer to place it as the first parameter of each constructor, so that it can be conveniently partially applied.
data LocatedExpr
= LocatedIf Location Value [Expr]
| LocatedVarDef Location String Value
If the annotation is a location, then this also makes it more convenient to obtain when writing certain kinds of parsers, along the lines of AnnotatedIf <$> (getSourceLocation <* ifKeyword) <*> value <*> many expr in a parser combinator library.
Parameterise your annotations: I often make the annotation type into a type parameter, so that GHC can derive some useful classes for me:
{-# Language
DeriveFoldable,
DeriveFunctor,
DeriveTraversable #-}
data AnnotatedExpr a
= AnnotatedIf a Value [Expr]
| AnnotatedVarDef a String Value
deriving (Functor, Foldable, Traversable)
type LocatedExpr = AnnotatedExpr Location
-- Get the annotation of an expression.
-- (Total as long as every constructor is annotated.)
exprAnnotation :: AnnotatedExpr a -> a
exprAnnotation = head
-- Update annotations purely.
mapAnnotations
:: (a -> b)
-> AnnotatedExpr a -> AnnotatedExpr b
mapAnnotations = fmap
-- traverse, foldMap, &c.
If you want “doesn’t interfere”, use polymorphism: you can enforce that the evaluator can’t inspect the annotation type by being polymorphic over it. Pattern synonyms still let you match on these expressions conveniently:
pattern If :: Value -> [AnnotatedExpr a] -> AnnotatedExpr a
pattern If val exprs <- AnnotatedIf _anno val exprs
-- …
eval :: AnnotatedExpr a -> Value
eval expr = case expr of
If val exprs -> -- …
VarDef name expr -> -- …
Unannotated terms aren’t your enemy: a term without source locations is no good for error reporting, but I think it’s still a good idea to make the pattern synonyms bidirectional for the convenience of constructing unannotated terms with a unit () annotation. (Or something equivalent, if you use e.g. Maybe Location as the annotation type.)
The reason is that this is quite convenient for writing unit tests, where you want to check the output, but want to use Eq instead of pattern matching, and don’t want to have to compare all the source locations in tests that aren’t concerned with them. Using the derived classes, void :: (Functor f) => f a -> f () strips out all the annotations on an AST.
import Control.Monad (void)
type BareExpr = AnnotatedExpr ()
-- One way to define bidirectional synonyms, so e.g.
-- ‘If’ can be used as either a pattern or a constructor.
pattern If :: Value -> [BareExpr] -> BareExpr
pattern If val exprs = AnnotatedIf () val exprs
-- …
stripAnnotations :: AnnotatedExpr a -> BareExpr
stripAnnotations = void
Equivalently, you could use GADTs / ExistentialQuantification to say data AnyExpr where { AnyExpr :: AnnotatedExpr a -> AnyExpr } / data AnyExpr = forall a. AnyExpr (AnnotatedExpr a); that way, the annotations have exactly as much information as (), but you don’t need to fmap over the entire tree with void in order to strip it, just apply the AnyExpr constructor to hide the type.
Finally, here are some brief introductions to a few AST typing solutions.
Annotate each AST node with a tag (e.g. a unique ID), then store all metadata like source locations, types, and whatever else, separately from the AST:
import Data.IntMap (IntMap)
-- More sophisticated/stronglier-typed tags are possible.
newtype Tag = Tag Int
newtype TagMap a = TagMap (IntMap a)
data Expr
= If !Tag Value [Expr]
| VarDef !Tag String Expr
type Span = (Location, Location)
type SourceMap = TagMap Span
type CommentMap = TagMap (Span, String)
parse
:: String -- Input
-> Either ParseError
( Expr -- Parsed expression
, SourceMap -- Source locations of tags
, CommentMap -- Sideband for comments
-- …
)
The advantage is that you can very easily mix in arbitrary new types of annotations anywhere, without affecting the AST itself, and avoid rewriting the AST just to change annotations. You can think of the tree and annotation tables as a kind of database, where the tags are the “foreign keys” relating them. A downside is that you must be careful to maintain these tags when you do rewrite the AST.
I don’t know if this approach has an established name; I think of it as just “tagging” or a “tagged AST”.
recursion-schemes and/or Data Types à la CartePDF: separate out the “recursive” part of an annotated expression tree from the “annotation” part, and use Fix to tie them back together, with Compose (or Cofree) to add annotations in the middle.
data ExprF e
= IfF Value [e]
| VarDefF String e
-- …
deriving (Foldable, Functor, Traversable, …)
-- Unannotated: Expr ~ ExprF (ExprF (ExprF (…)))
type Expr = Fix ExprF
-- With a location at each recursive step:
--
-- LocatedExpr ~ Located (ExprF (Located (ExprF (…))))
type LocatedExpr = Fix (Compose Located ExprF)
data Located a = Located Location a
deriving (Foldable, Functor, Traversable, …)
-- or: type Located = (,) Location
A distinct advantage is that you get a bunch of nice traversal stuff like cata for free-ish, so you can avoid having to write manual traversals over your AST over and over. A downside is that it adds some pattern clutter to clean up, as does the “à la carte” approach, but they do offer a lot of flexibility.
Trees That GrowPDF is overkill for just source locations, but in a serious compiler it’s quite helpful. If you expect to have more than one annotation type (such as inferred types or other analysis results) or an AST that changes over time, then you add a type parameter for the “compilation phase” (parsed, renamed, typechecked, desugared, &c.) and select field types or enable & disable constructors based on that index.
A really unfortunate downside of this is that you often have to rewrite the tree even in places nothing has changed, because everything depends on the “phase”. An alternative that I use is to add one type parameter for each type of phase or annotation that can vary independently, e.g. data Expr annotation termVarName typeVarName, and abstract over that with type and pattern synonyms. This lets you update indices independently and still use classes like Functor and Bitraversable.

Related

How do you model "metadata" in Haskell?

I'm writing a parser in Haskell (mostly just to learn). I have a working tokenizer and parser and I want to add line numbers when giving an error message. I have this type:
data Token = Lambda
| Dot
| LParen
| RParen
| Ident String
Back in OO land, I would just create a Metadata object that holds the token's position in the source code. So I could try this:
data Metadata = Pos String Int Int
Then, I could change Token to
data Token = Lambda Metadata
| Dot Metadata
| LParen Metadata
| RParen Metadata
| Ident String Metadata
However, my parser is written using pattern matching on the tokens. So now, all my pattern matching is broken because I need to also account for the Metadata. So that doesn't seem ideal. 99% of the time, I don't care about the Metadata.
So what's the "right" way to do what I want to do?
There’s a wide array of approaches to the design of syntax representations in Haskell, but I can offer some recommendations and reasoning.
It’s advisable to keep metadata annotations out of the Token type, so that it sticks to a single responsibility. If a Token represents just a token, its derived instances for Eq and so on will work as expected without needing to worry about when to ignore the annotation.
Thankfully, the alternatives are simple in this case. One option is to move the annotation info to a separate wrapper type.
-- An #'Anno' a# is a value of type #a# annotated with some 'Metadata'.
data Anno a = Anno { annotation :: Metadata, item :: a }
deriving
( Eq
, Ord
, Show
-- …
)
Now the tokeniser can return a sequence of annotated tokens, i.e. [Annotated Token]. You still need to update the use sites, but the changes are now much simpler. And you can ignore annotations in various ways:
-- Positional matching
f1 (Anno _meta (Ident name)) = …
-- Record matching
f2 Anno { item = Ident name } = …
-- With ‘NamedFieldPuns’
f3 Anno { item } = …
-- 'U'nannotated value; with ‘PatternSynonyms’
pattern U :: a -> Anno a
pattern U x <- Anno _meta x
f4 (U LParen) = …
You can deannotate a sequence of tokens with fmap item to reuse existing code that doesn’t care about location info. And since Anno is a type of kind Type -> Type, GHC can also derive Foldable, Functor, and Traversable for it, making it easy to operate on the annotated item with e.g. fmap and traverse.
This is the preferable approach for Token, but for a parsed AST containing annotations, you may want to make the annotation type a parameter of the AST type, for example:
data Expr a = Add a (Expr a) (Expr a) | Literal a Int
deriving (Eq, Foldable, Functor, Ord, Show, Traversable)
Then you can use Expr Metadata for an annotated term, or Expr () for an unannotated one. To compare terms for equality, such as in unit tests, you can use the Functor instance to strip out the annotations, e.g. void expr1 == void expr2, where void is equivalent to fmap (\ _meta -> ()) here.
In a larger codebase, if there’s a lot of code depending on a data type and you really want to avoid updating it all at once, you can wrap the old type in a module that exports a pattern synonym for each of the old constructors. This lets you gradually update the old code before deleting the adapter module.
Culturally, it’s typical in a self-contained Haskell codebase to simply make breaking changes, and let the compiler tell you everywhere that needs to be updated, since it’s so easy to do extensive refactoring with high assurance that it’s correct. We’re more concerned with backward compatibility when it comes to published library code, since that actually affects other people.

Subset algebraic data type, or type-level set, in Haskell

Suppose you have a large number of types and a large number of functions that each return "subsets" of these types.
Let's use a small example to make the situation more explicit. Here's a simple algebraic data type:
data T = A | B | C
and there are two functions f, g that return a T
f :: T
g :: T
For the situation at hand, assume it is important that f can only return a A or B and g can only return a B or C.
I would like to encode this in the type system. Here are a few reasons/circumstances why this might be desirable:
Let the functions f and g have a more informative signature than just ::T
Enforce that implementations of f and g do not accidentally return a forbidden type that users of the implementation then accidentally use
Allow code reuse, e.g. when helper functions are involved that only operate on subsets of type T
Avoid boilerplate code (see below)
Make refactoring (much!) easier
One way to do this is to split up the algebraic datatype and wrap the individual types as needed:
data A = A
data B = B
data C = C
data Retf = RetfA A | RetfB B
data Retg = RetgB B | RetgC C
f :: Retf
g :: Retg
This works, and is easy to understand, but carries a lot of boilerplate for frequent unwrapping of the return types Retf and Retg.
I don't see polymorphism being of any help, here.
So, probably, this is a case for dependent types. It's not really a type-level list, rather a type-level set, but I've never seen a type-level set.
The goal, in the end, is to encode the domain knowledge via the types, so that compile-time checks are available, without having excessive boilerplate. (The boilerplate gets really annoying when there are lots of types and lots of functions.)
Define an auxiliary sum type (to be used as a data kind) where each branch corresponds to a version of your main type:
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneKindSignatures #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DataKinds #-}
import Data.Kind
import Data.Void
import GHC.TypeLits
data Version = AllEnabled | SomeDisabled
Then define a type family that maps the version and the constructor name (given as a type-level Symbol) to the type () if that branch is allowed, and to the empty type Void if it's disallowed.
type Enabled :: Version -> Symbol -> Type
type family Enabled v ctor where
Enabled SomeDisabled "C" = Void
Enabled _ _ = ()
Then define your type as follows:
type T :: Version -> Type
data T v = A !(Enabled v "A")
| B !(Enabled v "B")
| C !(Enabled v "C")
(The strictness annotations are there to help the exhaustivity checker.)
Typeclass instances can be derived, but separately for each version:
deriving instance Show (T AllEnabled)
deriving instance Eq (T AllEnabled)
deriving instance Show (T SomeDisabled)
deriving instance Eq (T SomeDisabled)
Here's an example of use:
noC :: T SomeDisabled
noC = A ()
main :: IO ()
main = print $ case noC of
A _ -> "A"
B _ -> "B"
-- this doesn't give a warning with -Wincomplete-patterns
This solution makes pattern-matching and construction more cumbersome, because those () are always there.
A variation is to have one type family per branch (as in Trees that Grow) instead of a two-parameter type family.
I tried to achieve something like this in the past, but without much success -- I was not too satisfied with my solution.
Still, one can use GADTs to encode this constraint:
data TagA = IsA | NotA
data TagC = IsC | NotC
data T (ta :: TagA) (tc :: TagC) where
A :: T 'IsA 'NotC
B :: T 'NotA 'NotC
C :: T 'NotA 'IsC
-- existential wrappers
data TnotC where TnotC :: T ta 'NotC -> TnotC
data TnotA where TnotA :: T 'NotA tc -> TnotA
f :: TnotC
g :: TnotA
This however gets boring fast, because of the wrapping/unwrapping of the exponentials. Consumer functions are more convenient since we can write
giveMeNotAnA :: T 'NotA tc -> Int
to require anything but an A. Producer functions instead need to use existentials.
In a type with many constructors, it also gets inconvenient since we have to use a GADT with many tags/parameters. Maybe this can be streamlined with some clever typeclass machinery.
Giving each individual value its own type scales extremely badly, and is quite unnecessarily fine-grained.
What you probably want is just restrict the types by some property on their values. In e.g. Coq, that would be a subset type:
Inductive T: Type :=
| A
| B
| C.
Definition Retf: Type := { x: T | x<>C }.
Definition Retg: Type := { x: T | x<>A }.
Well, Haskell has no way of expressing such value constraints, but that doesn't stop you from creating types that conceptually fulfill them. Just use newtypes:
newtype Retf = Retf { getRetf :: T }
mkRetf :: T -> Maybe Retf
mkRetf C = Nothing
mkRetf x = Retf x
newtype Retg = Retg { getRetg :: T }
mkRetg :: ...
Then in the implementation of f, you match for the final result of mkRetf and raise an error if it's Nothing. That way, an implementation mistake that makes it give a C will unfortunately not give a compilation error, but at least a runtime error from within the function that's actually at fault, rather than somewhere further down the line.
An alternative that might be ideal for you is Liquid Haskell, which does support subset types. I can't say too much about it, but it's supposedly pretty good (and will in new GHC versions have direct support).

Haskell: function that takes a type argument and returns a value depending on that type?

The question is basically: how do I write a function f in Haskell that takes a value x and a type argument T, and then returns a value y = f x T which depends both on x and T, without explicitly ascribing the type of the entire expression f x T? (The f x T is not valid Haskell, but a placeholder-pseudo-syntax).
Consider the following situation. Suppose that I have a typeclass Transform a b which provides a single function transform :: a -> b. Suppose that I also have a bunch of instances of Transform for various combinations of types a b. Now I'd like to chain multiple transform-functions together. However, I want the Transform-instance to be selected depending on the previosly constructed chain and on the next type in the chain of transformations. Ideally, this would give me something like this (with hypothetical functions source and migrate and invalid syntax << >> for "passing type parameters"; migrate is used as infix-operation):
z = source<<A>> migrate <<B>> ... migrate <<Z>>
Here, source somehow generates values of type A, and each migrate<<T>> is supposed to find an instance Transform S T and append it to the chain.
What I came up with so far: It actually (almost) works in Haskell using type ascriptions. Consider the following (compilable) example:
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ExistentialQuantification #-}
-- compiles with:
-- The Glorious Glasgow Haskell Compilation System, version 8.2.2
-- A typeclass with two type-arguments
class Transform a b where
transform :: a -> b
-- instances of `T` forming a "diamond"
--
-- String
-- / \
-- / \
-- / \
-- / \
-- Double Rational
-- \ /
-- \ /
-- \ /
-- \ /
-- Int
--
instance Transform String Double where
transform = read
instance Transform String Rational where
transform = read -- turns out to be same as fo `Double`, but pretend it's different
instance Transform Double Int where
transform = round
instance Transform Rational Int where
transform = round -- pretend it's different from `Double`-version
-- A `MigrationPath` to `b` is
-- essentially some data source and
-- a chain of transformations
-- supplied by typeclass `T`
--
-- The `String` here is a dummy for a more
-- complex operation that is roughly `a -> b`
data MigrationPath b = Source b
| forall a . Modify (MigrationPath a) (a -> b)
-- A function that appends a transformation `T` from `a` to `b`
-- to a `MigrationPath a`
migrate :: Transform a b => MigrationPath a -> MigrationPath b
migrate mpa = Modify mpa transform
-- Build two paths along the left and right side
-- of the diamond
leftPath :: MigrationPath Int
leftPath = migrate ((migrate ((Source "3.333") :: (MigrationPath String))) :: (MigrationPath Double))
rightPath :: MigrationPath Int
rightPath = migrate((migrate ((Source "10/3") :: (MigrationPath String))) :: (MigrationPath Rational))
main = putStrLn "it compiles, ship it"
In this example, we define Transform instances such that they form two possible MigrationPaths from String to Int. Now, we (as a human beings) want to exercise our free will, and force the compiler to pick either the left path, or the right path in this chain of transformations.
This is even kind-of possible in this case. We can force the compiler to create the right chain by constructing an "onion" of constraints from type ascriptions:
leftPath :: MigrationPath Int
leftPath = migrate ((migrate ((Source "3.333") :: (MigrationPath String))) :: (MigrationPath Double))
However, I find it very sub-optimal for two reasons:
The AST (migrate ... (Type)) grows to both sides around the Source (this is a minor issue, it probably can be rectified using infix operators with left-associativity).
More severe: if the type of MigrationPath stored not only the target type, but also the source type, with the type-ascription approach we would have to repeat every type in the chain twice, which would make the entire approach too awkward to use.
Question: is there any way to construct the above chain of transformations in such a way that only "the next type", and not the entire "type of the MigrationPath T" has to be ascribed?
What I'm not asking: It is clear to me that in the above toy-example, it would be easier to define functions transformStringToInt :: String -> Int etc, and then just chain them together using .. This is not the question. The question is: how do I force the compiler to generate the expressions corresponding to transformStringToInt when I specify just the type. In the actual application, I want to specify only the types, and use a set of rather complicated rules to derive an appropriate instance with the right transform-function.
(Optional): Just to give an impression of what I'm looking for. Here is a completely analogous example from Scala:
// typeclass providing a transformation from `X` to `Y`
trait Transform[X, Y] {
def transform(x: X): Y
}
// Some data migration path ending with `X`
sealed trait MigrationPath[X] {
def migrate[Y](implicit t: Transform[X, Y]): MigrationPath[Y] = Migrate(this, t)
}
case class Source[X](x: X) extends MigrationPath[X]
case class Migrate[A, X](a: MigrationPath[A], t: Transform[A, X]) extends MigrationPath[X]
// really bad implementation of fractions
case class Q(num: Int, denom: Int) {
def toInt: Int = num / denom
}
// typeclass instances for various type combinations
implicit object TransformStringDouble extends Transform[String, Double] {
def transform(s: String) = s.toDouble
}
implicit object TransformStringQ extends Transform[String, Q] {
def transform(s: String) = Q(s.split("/")(0).toInt, s.split("/")(1).toInt)
}
implicit object TransformDoubleInt extends Transform[Double, Int] {
def transform(d: Double) = d.toInt
}
implicit object TransformQInt extends Transform[Q, Int] {
def transform(q: Q) = q.toInt
}
// constructing migration paths that yield `Int`
val leftPath = Source("3.33").migrate[Double].migrate[Int]
val rightPath = Source("10/3").migrate[Q].migrate[Int]
Notice how migrate-method requires nothing but the "next type", not the type ascription for the entire expression constructed so far.
Related: I want to note that this question is not an exact duplicate of "Pass Types as arguments to a function in Haskell?". My use case is a bit different. I also tend to disagree with the answers there that "it's not possible / you don't need it", because I actually do have a solution, it's just rather ugly from the purely syntactical point of view.
Use the TypeApplications language extension, which allows you to explicitly instantiate individual type variables. The following code seems to have the flavor you want, and it typechecks:
{-# LANGUAGE ExplicitForAll, FlexibleInstances, MultiParamTypeClasses, TypeApplications #-}
class Transform a b where
transform :: a -> b
instance Transform String Double where
transform = read
instance Transform String Rational where
transform = read
instance Transform Double Int where
transform = round
instance Transform Rational Int where
transform = round
transformTo :: forall b a. Transform a b => a -> b
transformTo = transform
stringToInt1 :: String -> Int
stringToInt1 = transform . transformTo #Double
stringToInt2 :: String -> Int
stringToInt2 = transform . transformTo #Rational
The definition transformTo uses an explicit use of forall to flip b and a so that TypeApplications will instantiate b first.
Use the type applications syntax extension.
> :set -XTypeApplications
> transform #_ #Int (transform #_ #Double "9007199254740993")
9007199254740992
> transform #_ #Int (transform #_ #Rational "9007199254740993%1")
9007199254740993
Inputs carefully chosen to give the lie to your "turns out to be the same as for Double" comment, even after correcting for syntax differences in the input.

What does (..) mean?

I'm trying to learn Haskell.
I'm reading the code on here[1]. I just copy and past some part of the code from lines:46 and 298-300.
Question: What does (..) mean?
I Hoggled it but I got no result.
module Pos.Core.Types(
-- something here
SharedSeed (..) -- what does this mean?
) where
newtype SharedSeed = SharedSeed
{ getSharedSeed :: ByteString
} deriving (Show, Eq, Ord, Generic, NFData, Typeable)
[1] https://github.com/input-output-hk/cardano-sl/blob/master/core/Pos/Core/Types.hs
The syntax of import/export lists has not much to do with the syntax of Haskell itself. It's just a comma-separated listing of everything you want to export from your module. Now, there's a problem there because Haskell really has two languages with symbols that may have the same name. This is particularly common with newtypes like the one in your example: you have a type-level name SharedSeed :: *, and also a value-level name (data constructor) SharedSeed :: ByteString -> SharedSeed.
This only happens with uppercase names, because lowercase at type level are always local type variables. Thus the convention in export lists that uppercase names refer to types.
But just exporting the type does not allow users to actually construct values of that type. That's often prudent: not all internal-representation values might make legal values of the newtype (see Bartek's example), so then it's better to only export a safe smart constructor instead of the unsafe data constructor.
But other times, you do want to make the data constructor available, certainly for multi-constructor types like Maybe. To that end, export lists have three syntaxes you can use:
module Pos.Core.Types(
SharedSeed (SharedSeed) will export the constructor SharedSeed. In this case that's of course the only constructor anyway, but if there were other constructors they would not be exported with this syntax.
SharedSeed (..) will export all constructors. Again, in this case that's only SharedSeed, but for e.g. Maybe it would export both Nothing and Just.
pattern SharedSeed will export ShareSeed as a standalone pattern (independent of the export of the ShareSeed type). This requires the -XPatternSynonyms extension.
)
It means "export all constructors and record fields for this data type".
When writing a module export list, there's three4 ways you can export a data type:
module ModuleD(
D, -- just the type, no constructors
D(..), -- the type and all its constructors
D(DA) -- the type and the specific constructor
) where
data D = DA A | DB B
If you don't export any constructors, the type, well, can't be constructed, at least directly. This is useful if you e.g. want to enforce some runtime invariants on the data type:
module Even (evenInt, toInt) where
newtype EvenInt = EvenInt Int deriving (Show, Eq)
evenInt :: Int -> Maybe EvenInt
evenInt x = if x `mod` 2 == 0 then Just x else Nothing
toInt :: EvenInt -> Int
toInt (EvenInt x) = x
The caller code can now use this type, but only in the allowed manner:
x = evenInt 2
putStrLn $ if isJust x then show . toInt . fromJust $ x else "Not even!"
As a side note, toInt is usually implemented indirectly via the record syntax for convenience:
data EvenInt = EvenInt { toInt :: Int }
4 See #leftaroundabout's answer

Data type design in Haskell

Learning Haskell, I write a formatter of C++ header files. First, I parse all class members into a-collection-of-class-members which is then passed to the formatting routine. To represent class members I have
data ClassMember = CmTypedef Typedef |
CmMethod Method |
CmOperatorOverload OperatorOverload |
CmVariable Variable |
CmFriendClass FriendClass |
CmDestructor Destructor
(I need to classify the class members this way because of some peculiarities of the formatting style.)
The problem that annoys me is that to "drag" any function defined for the class member types to the ClassMember level, I have to write a lot of redundant code. For example,
instance Formattable ClassMember where
format (CmTypedef td) = format td
format (CmMethod m) = format m
format (CmOperatorOverload oo) = format oo
format (CmVariable v) = format v
format (CmFriendClass fc) = format fc
format (CmDestructor d) = format d
instance Prettifyable ClassMember where
-- same story here
On the other hand, I would definitely like to have a list of ClassMember objects (at least, I think so), hence defining it as
data ClassMember a = ClassMember a
instance Formattable ClassMember a
format (ClassMember a) = format a
doesn't seem to be an option.
The alternatives I'm considering are:
Store in ClassMember not object instances themselves, but functions defined on the corresponding types, which are needed by the formatting routine. This approach breaks the modularity, IMO, as the parsing results, represented by [ClassMember], need to be aware of all their usages.
Define ClassMember as an existential type, so [ClassMember] is no longer a problem. I doubt whether this design is strict enough and, again, I need to specify all constraints in the definition, like data ClassMember = forall a . Formattable a => ClassMember a. Also, I would prefer a solution without using extensions.
Is what I'm doing a proper way to do it in Haskell or there is a better way?
First, consider trimming down that ADT a bit. Operator overloads and destructors are special kinds of methods, so it might make more sense to treat all three in CmMethod; Method will then have special ways to separate them. Alternatively, keep all three CmMethod, CmOperatorOverload, and CmDestructor, but let them all contain the same Method type.
But of course, you can reduce the complexity only so much.
As for the specific example of a Show instance: you really don't want to write that yourself except in some special cases. For your case, it's much more reasonable to have the instance derived automatically:
data ClassMember = CmTypedef Typedef
| CmMethod Method
| ...
| CmDestructor Destructor
deriving (Show)
This will give different results from your custom instance – because yours is wrong: showing a contained result should also give information about the constructor.
If you're not really interested in Show but talking about another class C that does something more specific to ClassMembers – well, then you probably shouldn't have defined C in the first place! The purpose of type classes is to express mathematical concepts that hold for a great variety of types.
A possible solution is to use records.
It can be used without extensions and preserves flexibility.
There is still some boilerplate code, but you need to type it only once for all. So if you would need to perform another set of operations over your ClassMember, it would be very easy and quick to do it.
Here is an example for your particular case (template Haskell and Control.Lens makes things easier but are not mandatory):
{-# LANGUAGE TemplateHaskell #-}
module Test.ClassMember
import Control.Lens
-- | The class member as initially defined.
data ClassMember =
CmTypedef Typedef
| CmMethod Method
| CmOperatorOverload OperatorOverload
| CmVariable Variable
| CmFriendClass FriendClass
| CmDestructor Destructor
-- | Some dummy definitions of the data types, so the code will compile.
data Typedef = Typedef
data Method = Method
data OperatorOverload = OperatorOverload
data Variable = Variable
data FriendClass = FriendClass
data Destructor = Destructor
{-|
A data type which defines one function per constructor.
Note the type a, which means that for a given Hanlder "a" all functions
must return "a" (as for a type class!).
-}
data Handler a = Handler
{
_handleType :: Typedef -> a
, _handleMethod :: Method -> a
, _handleOperator :: OperatorOverload -> a
, _handleVariable :: Variable -> a
, _handleFriendClass :: FriendClass -> a
, _handleDestructor :: Destructor -> a
}
{-|
Here I am using lenses. This is not mandatory at all, but makes life easier.
This is also the reason of the TemplateHaskell language pragma above.
-}
makeLenses ''Handler
{-|
A function acting as a dispatcher (the boilerplate code!!!), telling which
function of the handler must be used for a given constructor.
-}
handle :: Handler a -> ClassMember -> a
handle handler member =
case member of
CmTypedef a -> handler^.handleType $ a
CmMethod a -> handler^.handleMethod $ a
CmOperatorOverload a -> handler^.handleOperator $ a
CmVariable a -> handler^.handleVariable $ a
CmFriendClass a -> handler^.handleFriendClass $ a
CmDestructor a) -> handler^.handleDestructor $ a
{-|
A dummy format method.
I kept things simple here, but you could define much more complicated
functions.
You could even define some generic functions separately and... you could define
them with some extra arguments that you would only provide when building
the Handler! An (dummy!) example is the way the destructor function is
constructed.
-}
format :: Handler String
format = Handler
(\x -> "type")
(\x -> "method")
(\x -> "operator")
(\x -> "variable")
(\x -> "Friend")
(destructorFunc $ (++) "format ")
{-|
A dummy function showcasing partial application.
It has one more argument than handleDestructor. In practice you are free
to add as many as you wish as long as it ends with the expected type
(Destructor -> String).
-}
destructorFunc :: (String -> String) -> Destructor -> String
destructorFunc f _ = f "destructor"
{-|
Construction of the pretty handler which illustrates the reason why
using lens by keeping a nice and concise syntax.
The "&" is the backward operator and ".~" is the set operator.
All we do here is to change the functions of the handleType and the
handleDestructor.
-}
pretty :: Handler String
pretty = format & handleType .~ (\x -> "Pretty type")
& handleDestructor .~ (destructorFunc ((++) "Pretty "))
And now we can run some tests:
test1 = handle format (CmDestructor Destructor)
> "format destructor"
test2 = handle pretty (CmDestructor Destructor)
> "Pretty destructor"

Resources