Elegant way to do associated "data items" - haskell

I am trying to have associated "data items" of ContentType of and its content:
data ContentType = MyInt | MyBool deriving ( Show )
data Encoding'
= EncodingInt [Int]
| EncodingBool [Bool]
chooseContentType :: IO ContentType
chooseContentType = undefined
How do I make something like this, but type-checked?
data Encoding a =
Encoding { contentType :: ContentType
, content :: [a]
}

The feature you're looking for is called generalized algebraic data types (or GADTs, for short). It's a GHC extension, so you'll have to put a pragma at the top of your file to use them. (I'm also including StandaloneDeriving, because we'll use that to get your Show instance in a minute)
{-# LANGUAGE GADTs, StandaloneDeriving #-}
Now you can define your ContentType type fully. The idea is that ContentType is parameterized by the (eventual) type of its argument, so ContentType will have kind * -> * (i.e. it will take a type argument).
Now we're going to write it with a bit of a funny syntax.
data ContentType a where
MyInt :: ContentType Int
MyBool :: ContentType Bool
This says that MyInt isn't just a ContentType; it's a ContentType Int. We're keeping the type information alive in ContentType itself.
Now Encoding can be written basically the way you have it.
data Encoding a =
Encoding { contentType :: ContentType a
, content :: [a]
}
Encoding also takes a type parameter a. Its content must be a list of a, and its content type must be a content type that supports the type a. Since our example has only defined two content types, that means it must be MyInt for integers or MyBool for Booleans, and no other encoding will be supported.
We can recover your deriving (Show) on ContentType with a StandaloneDeriving clause (this is the second GHC extension we turned on in the above pragma)
deriving instance Show (ContentType a)
This is equivalent to your deriving clause, except that it sits on its own line since the GADT syntax doesn't really have a good place to put one in the same line.

Related

Type design for the AST of my language remembering token locations

I wrote a parser and evaluator for a simple programming language. Here is a simplified version of the types for the AST:
data Value = IntV Int | FloatV Float | BoolV Bool
data Expr = IfE Value [Expr] | VarDefE String Value
type Program = [Expr]
I want error messages to tell the line and column of the source code in which the error occured. For example, if the value in an If expression is not a boolean, I want the evaluator to show an error saying "expected boolean at line x, column y", with x and y referring to the location of the value.
So, what I need to do is redefine the previous types so that they can store the relevant locations of different things. One option would be to add a location to each constructor for expressions, like so:
type Location = (Int, Int)
data Expr = IfE Value [Expr] Location | VarDef String Value Location
This clearly isn't optimal, because I have to add those Location fields to every possible expression, and if for example a value contained other values, I would need to add locations to that value too:
{-
this would turn into FunctionCall String [Value] [Location],
with one location for each value in the function call
-}
data Value = ... | FunctionCall String [Value]
I came up with another solution, which allows me to add locations to everything:
data Located a = Located Location a
type LocatedExpr = Located Expr
type LocatedValue = Located Value
data Value = IntV Int | FloatV Float | BoolV Bool | FunctionCall String [LocatedValue]
data Expr = IfE LocatedValue [LocatedExpr] | VarDef String LocatedValue
data Program = [LocatedExpr]
However I don't like this that much. First of all, it clutters the definition of the evaluator and pattern matching has an extra layer every time. Also, I don't think saying that a function call takes located values as arguments is quite right. Function calls should take values as arguments, and locations should be metadata that doesn't interfere with the evaluator.
I need help redefining my types so that the solution is as clean as possible. Maybe there is a language extension or a design pattern I don't know about that could be helpful.
There are many ways to annotate an AST! This is half of what’s known as the AST typing problem, the other half being how you manage an AST that changes over the course of compilation. The problem isn’t exactly “solved”: all of the solutions have tradeoffs, and which one to pick depends on your expected use cases. I’ll go over a few that you might like to investigate at the end.
Whichever method you choose for organising the actual data types, if it makes pattern-matching ugly or unwieldy, the natural solution is PatternSynonyms.
Considering your first example:
{-# Language PatternSynonyms #-}
type Location = (Int, Int)
data Expr
= LocatedIf Value [Expr] Location
| LocatedVarDef String Value Location
-- Unidirectional pattern synonyms which ignore the location:
pattern If :: Value -> [Expr] -> Expr
pattern If val exprs <- LocatedIf val exprs _loc
pattern VarDef :: String -> Value -> Expr
pattern VarDef name expr <- LocatedVarDef name expr _loc
-- Inform GHC that matching ‘If’ and ‘VarDef’ is just as good
-- as matching ‘LocatedIf’ and ‘LocatedVarDef’.
{-# Complete If, VarDef #-}
This may be sufficiently tidy for your purposes already. But here are a few more tips that I find helpful.
Put annotations first: when adding an annotation type to an AST directly, I often prefer to place it as the first parameter of each constructor, so that it can be conveniently partially applied.
data LocatedExpr
= LocatedIf Location Value [Expr]
| LocatedVarDef Location String Value
If the annotation is a location, then this also makes it more convenient to obtain when writing certain kinds of parsers, along the lines of AnnotatedIf <$> (getSourceLocation <* ifKeyword) <*> value <*> many expr in a parser combinator library.
Parameterise your annotations: I often make the annotation type into a type parameter, so that GHC can derive some useful classes for me:
{-# Language
DeriveFoldable,
DeriveFunctor,
DeriveTraversable #-}
data AnnotatedExpr a
= AnnotatedIf a Value [Expr]
| AnnotatedVarDef a String Value
deriving (Functor, Foldable, Traversable)
type LocatedExpr = AnnotatedExpr Location
-- Get the annotation of an expression.
-- (Total as long as every constructor is annotated.)
exprAnnotation :: AnnotatedExpr a -> a
exprAnnotation = head
-- Update annotations purely.
mapAnnotations
:: (a -> b)
-> AnnotatedExpr a -> AnnotatedExpr b
mapAnnotations = fmap
-- traverse, foldMap, &c.
If you want “doesn’t interfere”, use polymorphism: you can enforce that the evaluator can’t inspect the annotation type by being polymorphic over it. Pattern synonyms still let you match on these expressions conveniently:
pattern If :: Value -> [AnnotatedExpr a] -> AnnotatedExpr a
pattern If val exprs <- AnnotatedIf _anno val exprs
-- …
eval :: AnnotatedExpr a -> Value
eval expr = case expr of
If val exprs -> -- …
VarDef name expr -> -- …
Unannotated terms aren’t your enemy: a term without source locations is no good for error reporting, but I think it’s still a good idea to make the pattern synonyms bidirectional for the convenience of constructing unannotated terms with a unit () annotation. (Or something equivalent, if you use e.g. Maybe Location as the annotation type.)
The reason is that this is quite convenient for writing unit tests, where you want to check the output, but want to use Eq instead of pattern matching, and don’t want to have to compare all the source locations in tests that aren’t concerned with them. Using the derived classes, void :: (Functor f) => f a -> f () strips out all the annotations on an AST.
import Control.Monad (void)
type BareExpr = AnnotatedExpr ()
-- One way to define bidirectional synonyms, so e.g.
-- ‘If’ can be used as either a pattern or a constructor.
pattern If :: Value -> [BareExpr] -> BareExpr
pattern If val exprs = AnnotatedIf () val exprs
-- …
stripAnnotations :: AnnotatedExpr a -> BareExpr
stripAnnotations = void
Equivalently, you could use GADTs / ExistentialQuantification to say data AnyExpr where { AnyExpr :: AnnotatedExpr a -> AnyExpr } / data AnyExpr = forall a. AnyExpr (AnnotatedExpr a); that way, the annotations have exactly as much information as (), but you don’t need to fmap over the entire tree with void in order to strip it, just apply the AnyExpr constructor to hide the type.
Finally, here are some brief introductions to a few AST typing solutions.
Annotate each AST node with a tag (e.g. a unique ID), then store all metadata like source locations, types, and whatever else, separately from the AST:
import Data.IntMap (IntMap)
-- More sophisticated/stronglier-typed tags are possible.
newtype Tag = Tag Int
newtype TagMap a = TagMap (IntMap a)
data Expr
= If !Tag Value [Expr]
| VarDef !Tag String Expr
type Span = (Location, Location)
type SourceMap = TagMap Span
type CommentMap = TagMap (Span, String)
parse
:: String -- Input
-> Either ParseError
( Expr -- Parsed expression
, SourceMap -- Source locations of tags
, CommentMap -- Sideband for comments
-- …
)
The advantage is that you can very easily mix in arbitrary new types of annotations anywhere, without affecting the AST itself, and avoid rewriting the AST just to change annotations. You can think of the tree and annotation tables as a kind of database, where the tags are the “foreign keys” relating them. A downside is that you must be careful to maintain these tags when you do rewrite the AST.
I don’t know if this approach has an established name; I think of it as just “tagging” or a “tagged AST”.
recursion-schemes and/or Data Types à la CartePDF: separate out the “recursive” part of an annotated expression tree from the “annotation” part, and use Fix to tie them back together, with Compose (or Cofree) to add annotations in the middle.
data ExprF e
= IfF Value [e]
| VarDefF String e
-- …
deriving (Foldable, Functor, Traversable, …)
-- Unannotated: Expr ~ ExprF (ExprF (ExprF (…)))
type Expr = Fix ExprF
-- With a location at each recursive step:
--
-- LocatedExpr ~ Located (ExprF (Located (ExprF (…))))
type LocatedExpr = Fix (Compose Located ExprF)
data Located a = Located Location a
deriving (Foldable, Functor, Traversable, …)
-- or: type Located = (,) Location
A distinct advantage is that you get a bunch of nice traversal stuff like cata for free-ish, so you can avoid having to write manual traversals over your AST over and over. A downside is that it adds some pattern clutter to clean up, as does the “à la carte” approach, but they do offer a lot of flexibility.
Trees That GrowPDF is overkill for just source locations, but in a serious compiler it’s quite helpful. If you expect to have more than one annotation type (such as inferred types or other analysis results) or an AST that changes over time, then you add a type parameter for the “compilation phase” (parsed, renamed, typechecked, desugared, &c.) and select field types or enable & disable constructors based on that index.
A really unfortunate downside of this is that you often have to rewrite the tree even in places nothing has changed, because everything depends on the “phase”. An alternative that I use is to add one type parameter for each type of phase or annotation that can vary independently, e.g. data Expr annotation termVarName typeVarName, and abstract over that with type and pattern synonyms. This lets you update indices independently and still use classes like Functor and Bitraversable.

Associate a non-type-class function with a type in Haskell

This is a follow-up question to Associate a function with a type in Haskell.
Again, suppose you have a serializer/deserializer type class
class SerDes a where
ser :: a -> ByteString
des :: ByteString -> a
and you want to provide some sanity check or test case that varies with type a,
check :: ByteString -> Bool
Of course des is used deep in the guts of check but the type a can't be deduced. To do
something useful with a it probably needs to be member of some type class, let's say it's Show a.
As usual, Proxy can do the job:
data Proxy a = Proxy -- or import Data.Proxy
check :: Proxy a -> ByteString -> Bool
check (Proxy :: Proxy MyType) input = ...
check (Proxy :: Proxy MyOtherType) input = ...
(With the extension TypeApplications this can be made more succinct: check (Proxy #MyType) ....)
But can Proxy be avoided? (Provided you can't move check into the SerDes type class.)
I must first mention that you have a problem in the design of your typeclass. The signature of your des function says that for each input ByteString there exists a valid output a. This is likely not true.
For example, imagine having an instance of SerDes for Int. By your definition, you'll have a valid Int representation even for 6GB ByteString of random data. Sounds wrong.
For this reason, you need to specify the possibility of deserialization failures in the signature of des. A typical approach would be to wrap a in Maybe or Either YourDetailedRepresentationOfFailure. E.g.,
class SerDes a where
ser :: a -> ByteString
des :: ByteString -> Either Text a
In fact this is the approach essentially taken by all Haskell serialization and parsing libraries. They may introduce some abstractions there, but in essence they all find ways to represent deserialization failures.
Now to your actual question. Typeclass instances are identified by the type they are for, so you must provide the specific type for a somehow. Proxy is one option for that. Another one is to directly refer to that type without using it in your function and passing in the undefined value for it. Third (and cleanest one, IMO) is to wrap the result in Tagged.
The undefined option
{-# LANGUAGE ScopedTypeVariables, TypeApplications #-}
check :: forall a. SerDes a => a -> ByteString -> Bool
check _ bytes =
isRight (des #a bytes)
Please notice that forall and ScopedTypeVariables extension are required to be able to refer to the type parameter in the function def.
You'll then call this function like this:
check (undefined :: a) bytes
or this:
check #a undefined bytes
The Tagged option
check :: SerDes a => ByteString -> Tagged a Bool
check bytes =
fmap isRight (Tagged (des bytes))
You'll then call this function like this:
unTagged (check #a bytes)
Final note
In reality you'll likely never need the check function, since des already carries all the information needed and more. It's easier and more understandable to just have isRight (des #a bytes) in place where you need check. The fact that you have to go thru complications for defining check is actually a signal of a design mistake. In practical Haskell code you'll rarely meet such complications.
Yes, you can specify the type a via TypeApplications in the exact same way as in the answer you have linked:
{-# LANGUAGE ScopedTypeVariables, AllowAmbiguousTypes, TypeApplications #-}
check :: forall a. SerDes a => ByteString -> Bool
check bytes = ... des #a bytes ...
Note that you need ScopedTypeVariables and forall a. in order to create a scope for the type variable a so that you can refer to it when calling des. Without an explicit forall, the type variable's scope would be limited to just the type signature and you couldn't mention it in the body.
To call the check function, use type application as before:
check #Int bytes

Tagging a string with corresponding symbol

I would like an easy way to create a String tagged with itself. Right now I can
do something like:
data TagString :: Symbol -> * where
Tag :: String -> TagString s
deriving Show
tag :: KnownSymbol s => Proxy s -> TagString s
tag s = Tag (symbolVal s)
and use it like
tag (Proxy :: Proxy "blah")
But this is not nice because
The guarantee about the tag is only provided by tag not by the GADT.
Every time I want to create a value I have to provide a type signature, which
gets unwieldy if the value is part of some bigger expression.
Is there any way to improve this, preferably going in the opposite direction, i.e. from String to Symbol? I would like to write Tag "blah" and have ghc infer the type
TagString "blah".
GHC.TypeLits provides the someSymbolVal function which looks somewhat
related but it produces a SomeSymbol, not a Symbol and I can quite grasp how to use
it.
Is there any way to improve this, preferably going in the opposite direction, i.e. from String to Symbol?
There is no way to go directly from String to Symbol, because Haskell isn't dependently typed, unfortunately. You do have to write out a type annotation every time you want a new value and there isn't an existing tag with the desired symbol already around.
The guarantee about the tag is only provided by tag not by the GADT.
The following should work well (in fact, the same type can be found in the singletons package):
data SSym :: Symbol -> * where
SSym :: KnownSymbol s => SSym s
-- defining values
sym1 = SSym :: SSym "foo"
sym2 = SSym :: SSym "bar"
This type essentially differs from Proxy only by having the KnownSymbol dictionary in the constructor. The dictionary lets us recover the string contained within even if the symbol is not known statically:
extractString :: SSym s -> String
extractString s#SSym = symbolVal s
We pattern matched on SSym, thereby bringing into scope the implicit KnownSymbol dictionary. The same doesn't work with a mere Proxy:
extractString' :: forall (s :: Symbol). Proxy s -> String
extractString' p#Proxy = symbolVal p
-- type error, we can't recover the string from anywhere
... it produces a SomeSymbol, not a Symbol and I can quite grasp how to use it.
SomeSymbol is like SSym except it hides the string it carries around so that it doesn't appear in the type. The string can be recovered by pattern matching on the constructor.
extractString'' :: SomeSymbol -> String
extractString'' (SomeSymbol proxy) = symbolVal proxy
It can be useful when you want to manipulate different symbols in bulk, for example you can put them in a list (which you can't do with different SSym-s, because their types differ).

Haskell data type fields

I have my own data type:
type Types = String
data MyType = MyType [Types]
I have a utility function:
initMyType :: [Types] -> MyType
initMyType types = Mytype types
Now I create:
let a = MyType ["1A", "1B", "1C"]
How can I get the list ["1A", "1B", "1C"] from a? And in general, how can I get data from a data constructor?
Besides using pattern matching, as in arrowdodger's answer, you can also use record syntax to define the accessor automatically:
data MyType = MyType { getList :: [Types] }
This defines a type exactly the same as
data MyType = MyType [Types]
but also defines a function
getList :: MyType -> [Types]
and allows (but doesn't require) the syntax MyType { getList = ["a", "b", "c"] } for constructing values of MyType. By the way, initMyTypes is not really necessary unless it does something else besides constructing the value, because it does exactly the same as the constructor MyType (but can't be used in pattern matching).
You can pattern-match it somewhere in your code, or write deconstructing function:
getList (MyType lst) = lst
There are many ways of pattern matching in Haskell. See http://www.haskell.org/tutorial/patterns.html for details on where patterns can be used (e.g. case statements), and on different kinds of patterns (lazy patterns, 'as' patterns etc)
(Answer for future generations.)
So, your task is to obtain all fields from a data type constructor.
Here is a pretty elegant way to do this.
We're going to use DeriveFoldable GHC extension, so to enable it we must
change your datatype declaration like so: data MyType a = MyType [a] deriving (Show, Foldable).
Here is a generic way to get all data from a data constructor.
This is the whole solution:
{-# LANGUAGE DeriveFoldable #-}
import Data.Foldable (toList)
type Types = String
data MyType a = MyType [a] deriving (Show, Foldable)
main =
let a = MyType ["1A", "1B", "1C"] :: MyType Types
result = toList a
in print result
Printing the result will give you '["1A","1B","1C"]' as you wanted.

Haskell -- any way to qualify or disambiguate record names?

I have two data types, which are used for hastache templates. It makes sense in my code to have two different types, both with a field named "name". This, of course, causes a conflict. It seems that there's a mechanism to disambiguate any calls to "name", but the actual definition causes problems. Is there any workaround, say letting the record field name be qualified?
data DeviceArray = DeviceArray
{ name :: String,
bytes :: Int }
deriving (Eq, Show, Data, Typeable)
data TemplateParams = TemplateParams
{ arrays :: [DeviceArray],
input :: DeviceArray }
deriving (Eq, Show, Data, Typeable)
data MakefileParams = MakefileParams
{ name :: String }
deriving (Eq, Show, Data, Typeable)
i.e. if the fields are now used in code, they will be "DeviceArray.name" and "MakefileParams.name"?
As already noted, this isn't directly possible, but I'd like to say a couple things about proposed solutions:
If the two fields are clearly distinct, you'll want to always know which you're using anyway. By "clearly distinct" here I mean that there would never be a circumstance where it would make sense to do the same thing with either field. Given this, excess disambiguity isn't really unwelcome, so you'd want either qualified imports as the standard approach, or the field disambiguation extension if that's more to your taste. Or, as a very simplistic (and slightly ugly) option, just manually prefix the fields, e.g. deviceArrayName instead of just name.
If the two fields are in some sense the same thing, it makes sense to be able to treat them in a homogeneous way; ideally you could write a function polymorphic in choice of name field. In this case, one option is using a type class for "named things", with functions that let you access the name field on any appropriate type. A major downside here, besides a proliferation of trivial type constraints and possible headaches from the Dreaded Monomorphism Restriction, is that you also lose the ability to use the record syntax, which begins to defeat the whole point.
The other major option for similar fields, which I didn't see suggested yet, is to extract the name field out into a single parameterized type, e.g. data Named a = Named { name :: String, item :: a }. GHC itself uses this approach for source locations in syntax trees, and while it doesn't use record syntax the idea is the same. The downside here is that if you have a Named DeviceArray, accessing the bytes field now requires going through two layers of records. If you want to update the bytes field with a function, you're stuck with something like this:
addBytes b na = na { item = (item na) { bytes = b + bytes (item na) } }
Ugh. There are ways to mitigate the issue a bit, but they're still not idea, to my mind. Cases like this are why I don't like record syntax in general. So, as a final option, some Template Haskell magic and the fclabels package:
{-# LANGUAGE TemplateHaskell #-}
import Control.Category
import Data.Record.Label
data Named a = Named
{ _name :: String,
_namedItem :: a }
deriving (Eq, Show, Data, Typeable)
data DeviceArray = DeviceArray { _bytes :: Int }
deriving (Eq, Show, Data, Typeable)
data MakefileParams = MakefileParams { _makefileParams :: [MakeParam] }
deriving (Eq, Show, Data, Typeable)
data MakeParam = MakeParam { paramText :: String }
deriving (Eq, Show, Data, Typeable)
$(mkLabels [''Named, ''DeviceArray, ''MakefileParams, ''MakeParam])
Don't mind the MakeParam business, I just needed a field on there to do something with. Anyway, now you can modify fields like this:
addBytes b = modL (namedItem >>> bytes) (b +)
nubParams = modL (namedItem >>> makefileParams) nub
You could also name bytes something like bytesInternal and then export an accessor bytes = namedItem >>> bytesInternal if you like.
Record field names are in the same scope as the data type, so you cannot do this directly.
The common ways to work around this is to either add prefixes to the field names, e.g. daName, mpName, or put them in separate modules which you then import qualified.
What you can do is to put each data type in its own module, then you can used qualified imports to disambiguate. It's a little clunky, but it works.
There are several GHC extensions which may help. The linked one is applicable in your case.
Or, you could refactor your code and use typeclasses for the common fields in records. Or, you should manually prefix each record selector with a prefix.
If you want to use the name in both, you can use a Class that define the name funcion. E.g:
Class Named a where
name :: a -> String
data DeviceArray = DeviceArray
{ deviceArrayName :: String,
bytes :: Int }
deriving (Eq, Show, Data, Typeable)
instance Named DeviceArray where
name = deviceArrayName
data MakefileParams = MakefileParams
{ makefileParamsName :: String }
deriving (Eq, Show, Data, Typeable)
instance Named MakefileParams where
name = makefileParamsName
And then you can use name on both classes.

Resources