Is it possible to use SYB to transform the type?

Is it possible to use SYB to transform the type? - haskell

I want to write a rename function to replace String names (which represent hierarchical identifiers) in my AST with GUID names (integers) from a symbol table carried as hidden state in a Renamer monad.
I have an AST a type that is parameterized over the type of name. Names in the leaves of the AST are of type Name a:
data Name a = Name a
Which makes it easy to target them with a SYB transformer.
The parser is typed (ignoring the possibility of error for brevity):
parse :: String -> AST String
and I want the rename function to be typed:
rename :: AST String -> Renamer (AST GUID)
Is it possible to use SYB to transform all Name String's into Name GUID's with a transformer:
resolveName :: Name String -> Renamer (Name GUID)
and all other values from c String to c GUID by transforming their children, and pasting them back together with the same constructor, albeit with a different type parameter?
The everywhereM function is close to what I want, but it can only transform c a -> m (c a) and not c a -> m (c b).
My fallback solution (other than writing the boiler-plate by hand) is to remove the type parameter from AST, and define Name like this:
data Name = StrName String
| GuidName GUID
so that the rename would be typed:
rename :: AST -> Renamer AST
making it work with everywhereM. However, this would leave the possibility that an AST could still hold StrName's after being renamed. I wanted to use the type system to formally capture the fact that a renamed AST can only hold GUID names.

One solution (perhaps less efficient than you were hoping for) would be to make your AST an instance of Functor, Data and Typeable (GHC 7 can probably derive all of these for you) then do:
import Data.Generics.Uniplate.Data(universeBi) -- from the uniplate package
import qualified Data.Map as Map
rename :: AST String -> Renamer (AST GUID)
rename x = do
let names = nub $ universeBi x :: [Name String]
guids <- mapM resolveName names
let mp = Map.fromList $ zip names guids
return $ fmap (mp Map.!) x
Two points:
I'm assuming it's easy to eliminate the Name bit from resolveName, but I suspect it is.
You can switch universeBi for something equivalent in SYB, but I find it much easier to understand the Uniplate versions.

Related

Type design for the AST of my language remembering token locations

I wrote a parser and evaluator for a simple programming language. Here is a simplified version of the types for the AST:
data Value = IntV Int | FloatV Float | BoolV Bool
data Expr = IfE Value [Expr] | VarDefE String Value
type Program = [Expr]
I want error messages to tell the line and column of the source code in which the error occured. For example, if the value in an If expression is not a boolean, I want the evaluator to show an error saying "expected boolean at line x, column y", with x and y referring to the location of the value.
So, what I need to do is redefine the previous types so that they can store the relevant locations of different things. One option would be to add a location to each constructor for expressions, like so:
type Location = (Int, Int)
data Expr = IfE Value [Expr] Location | VarDef String Value Location
This clearly isn't optimal, because I have to add those Location fields to every possible expression, and if for example a value contained other values, I would need to add locations to that value too:
{-
this would turn into FunctionCall String [Value] [Location],
with one location for each value in the function call
-}
data Value = ... | FunctionCall String [Value]
I came up with another solution, which allows me to add locations to everything:
data Located a = Located Location a
type LocatedExpr = Located Expr
type LocatedValue = Located Value
data Value = IntV Int | FloatV Float | BoolV Bool | FunctionCall String [LocatedValue]
data Expr = IfE LocatedValue [LocatedExpr] | VarDef String LocatedValue
data Program = [LocatedExpr]
However I don't like this that much. First of all, it clutters the definition of the evaluator and pattern matching has an extra layer every time. Also, I don't think saying that a function call takes located values as arguments is quite right. Function calls should take values as arguments, and locations should be metadata that doesn't interfere with the evaluator.
I need help redefining my types so that the solution is as clean as possible. Maybe there is a language extension or a design pattern I don't know about that could be helpful.

There are many ways to annotate an AST! This is half of what’s known as the AST typing problem, the other half being how you manage an AST that changes over the course of compilation. The problem isn’t exactly “solved”: all of the solutions have tradeoffs, and which one to pick depends on your expected use cases. I’ll go over a few that you might like to investigate at the end.
Whichever method you choose for organising the actual data types, if it makes pattern-matching ugly or unwieldy, the natural solution is PatternSynonyms.
Considering your first example:
{-# Language PatternSynonyms #-}
type Location = (Int, Int)
data Expr
= LocatedIf Value [Expr] Location
| LocatedVarDef String Value Location
-- Unidirectional pattern synonyms which ignore the location:
pattern If :: Value -> [Expr] -> Expr
pattern If val exprs <- LocatedIf val exprs _loc
pattern VarDef :: String -> Value -> Expr
pattern VarDef name expr <- LocatedVarDef name expr _loc
-- Inform GHC that matching ‘If’ and ‘VarDef’ is just as good
-- as matching ‘LocatedIf’ and ‘LocatedVarDef’.
{-# Complete If, VarDef #-}
This may be sufficiently tidy for your purposes already. But here are a few more tips that I find helpful.
Put annotations first: when adding an annotation type to an AST directly, I often prefer to place it as the first parameter of each constructor, so that it can be conveniently partially applied.
data LocatedExpr
= LocatedIf Location Value [Expr]
| LocatedVarDef Location String Value
If the annotation is a location, then this also makes it more convenient to obtain when writing certain kinds of parsers, along the lines of AnnotatedIf <$> (getSourceLocation <* ifKeyword) <*> value <*> many expr in a parser combinator library.
Parameterise your annotations: I often make the annotation type into a type parameter, so that GHC can derive some useful classes for me:
{-# Language
DeriveFoldable,
DeriveFunctor,
DeriveTraversable #-}
data AnnotatedExpr a
= AnnotatedIf a Value [Expr]
| AnnotatedVarDef a String Value
deriving (Functor, Foldable, Traversable)
type LocatedExpr = AnnotatedExpr Location
-- Get the annotation of an expression.
-- (Total as long as every constructor is annotated.)
exprAnnotation :: AnnotatedExpr a -> a
exprAnnotation = head
-- Update annotations purely.
mapAnnotations
:: (a -> b)
-> AnnotatedExpr a -> AnnotatedExpr b
mapAnnotations = fmap
-- traverse, foldMap, &c.
If you want “doesn’t interfere”, use polymorphism: you can enforce that the evaluator can’t inspect the annotation type by being polymorphic over it. Pattern synonyms still let you match on these expressions conveniently:
pattern If :: Value -> [AnnotatedExpr a] -> AnnotatedExpr a
pattern If val exprs <- AnnotatedIf _anno val exprs
-- …
eval :: AnnotatedExpr a -> Value
eval expr = case expr of
If val exprs -> -- …
VarDef name expr -> -- …
Unannotated terms aren’t your enemy: a term without source locations is no good for error reporting, but I think it’s still a good idea to make the pattern synonyms bidirectional for the convenience of constructing unannotated terms with a unit () annotation. (Or something equivalent, if you use e.g. Maybe Location as the annotation type.)
The reason is that this is quite convenient for writing unit tests, where you want to check the output, but want to use Eq instead of pattern matching, and don’t want to have to compare all the source locations in tests that aren’t concerned with them. Using the derived classes, void :: (Functor f) => f a -> f () strips out all the annotations on an AST.
import Control.Monad (void)
type BareExpr = AnnotatedExpr ()
-- One way to define bidirectional synonyms, so e.g.
-- ‘If’ can be used as either a pattern or a constructor.
pattern If :: Value -> [BareExpr] -> BareExpr
pattern If val exprs = AnnotatedIf () val exprs
-- …
stripAnnotations :: AnnotatedExpr a -> BareExpr
stripAnnotations = void
Equivalently, you could use GADTs / ExistentialQuantification to say data AnyExpr where { AnyExpr :: AnnotatedExpr a -> AnyExpr } / data AnyExpr = forall a. AnyExpr (AnnotatedExpr a); that way, the annotations have exactly as much information as (), but you don’t need to fmap over the entire tree with void in order to strip it, just apply the AnyExpr constructor to hide the type.
Finally, here are some brief introductions to a few AST typing solutions.
Annotate each AST node with a tag (e.g. a unique ID), then store all metadata like source locations, types, and whatever else, separately from the AST:
import Data.IntMap (IntMap)
-- More sophisticated/stronglier-typed tags are possible.
newtype Tag = Tag Int
newtype TagMap a = TagMap (IntMap a)
data Expr
= If !Tag Value [Expr]
| VarDef !Tag String Expr
type Span = (Location, Location)
type SourceMap = TagMap Span
type CommentMap = TagMap (Span, String)
parse
:: String -- Input
-> Either ParseError
( Expr -- Parsed expression
, SourceMap -- Source locations of tags
, CommentMap -- Sideband for comments
-- …
)
The advantage is that you can very easily mix in arbitrary new types of annotations anywhere, without affecting the AST itself, and avoid rewriting the AST just to change annotations. You can think of the tree and annotation tables as a kind of database, where the tags are the “foreign keys” relating them. A downside is that you must be careful to maintain these tags when you do rewrite the AST.
I don’t know if this approach has an established name; I think of it as just “tagging” or a “tagged AST”.
recursion-schemes and/or Data Types à la CartePDF: separate out the “recursive” part of an annotated expression tree from the “annotation” part, and use Fix to tie them back together, with Compose (or Cofree) to add annotations in the middle.
data ExprF e
= IfF Value [e]
| VarDefF String e
-- …
deriving (Foldable, Functor, Traversable, …)
-- Unannotated: Expr ~ ExprF (ExprF (ExprF (…)))
type Expr = Fix ExprF
-- With a location at each recursive step:
--
-- LocatedExpr ~ Located (ExprF (Located (ExprF (…))))
type LocatedExpr = Fix (Compose Located ExprF)
data Located a = Located Location a
deriving (Foldable, Functor, Traversable, …)
-- or: type Located = (,) Location
A distinct advantage is that you get a bunch of nice traversal stuff like cata for free-ish, so you can avoid having to write manual traversals over your AST over and over. A downside is that it adds some pattern clutter to clean up, as does the “à la carte” approach, but they do offer a lot of flexibility.
Trees That GrowPDF is overkill for just source locations, but in a serious compiler it’s quite helpful. If you expect to have more than one annotation type (such as inferred types or other analysis results) or an AST that changes over time, then you add a type parameter for the “compilation phase” (parsed, renamed, typechecked, desugared, &c.) and select field types or enable & disable constructors based on that index.
A really unfortunate downside of this is that you often have to rewrite the tree even in places nothing has changed, because everything depends on the “phase”. An alternative that I use is to add one type parameter for each type of phase or annotation that can vary independently, e.g. data Expr annotation termVarName typeVarName, and abstract over that with type and pattern synonyms. This lets you update indices independently and still use classes like Functor and Bitraversable.

Safe Record field query

Is there a clean way to avoid the following boilerplate:
Given a Record data type definition....
data Value = A{ name::String } | B{ name::String } | C{}
write a function that safely returns name
getName :: Value -> Maybe String
getName A{ name=x } = Just x
getName B{ name=x } = Just x
getName C{} = Nothing
I know you can do this with Template Haskell, I am looking for a cleaner soln than that, perhaps a GHC extension or something else I've overlooked.

lens's Template Haskell helpers do the right thing when they encounter partial record fields.
{-# LANGUAGE TemplateHaskell #-}
import Control.Applicative
import Control.Lens
data T = A { _name :: String }
| B { _name :: String }
| C
makeLenses ''T
This'll generate a Traversal' called name that selects the String inside the A and B constructors and does nothing in the C case.
ghci> :i name
name :: Traversal' T String -- Defined at test.hs:11:1
So we can use the ^? operator (which is a flipped synonym for preview) from Control.Lens.Fold to pull out Maybe the name.
getName :: T -> Maybe String
getName = (^? name)
You can also make Prism's for the constructors of your datatype, and choose the first one of those which matches using <|>. This version is useful when the fields of your constructors have different names, but you do have to remember to update your extractor function when you add constructors.
makePrisms ''T
getName' :: T -> Maybe String
getName' t = t^?_A <|> t^?_B
lens is pretty useful!

Why don't you use a GADT? I do not know if you are interested in using only records. But, I fell that GADTs provide a clean solution to your problem, since you can restrict what constructors are valid by refining types.
{-# LANGUAGE GADTs #-}
module Teste where
data Value a where
A :: String -> Value String
B :: String -> Value String
C :: Value ()
name :: Value String -> String
name (A s) = s
name (B s) = s
Notice that both A and B produce Value String values while C produces Value (). When you define function
name :: Value String -> String
it specifically says that you can only pass a value that has a string in it. So, you can only pattern match on A or B values. This is useful to avoid the need of Maybe in code.

Multiple words in function type declaration confuses me

I understand haskell's function type declaration like,
length :: String -> Int
prefix :: Int -> String -> String
But sometimes, the types on the right side are not simple types like String, Integer but it contains multiple literal words, and the words that look like custom defined etc.
For example, these types defined on this post,
withLocation :: Q Exp -> Q Exp
What does Q, Exp mean?
formatLoc :: Loc -> String
What does Loc mean? Is it part of haskell library?

The types Q, Exp and Loc are types from the Template Haskell module imported at the beginning of the source file.
Q is a parameterized type, just like, say, Maybe or IO from the prelude, which is here applied to Exp.
How to do goto defintion from emacs editor?
This can be achieved using Scion, but that won't help you for this use case as it does not allow you to jump into external libraries (which may not be available in source form anyway).

Tagging a string with corresponding symbol

I would like an easy way to create a String tagged with itself. Right now I can
do something like:
data TagString :: Symbol -> * where
Tag :: String -> TagString s
deriving Show
tag :: KnownSymbol s => Proxy s -> TagString s
tag s = Tag (symbolVal s)
and use it like
tag (Proxy :: Proxy "blah")
But this is not nice because
The guarantee about the tag is only provided by tag not by the GADT.
Every time I want to create a value I have to provide a type signature, which
gets unwieldy if the value is part of some bigger expression.
Is there any way to improve this, preferably going in the opposite direction, i.e. from String to Symbol? I would like to write Tag "blah" and have ghc infer the type
TagString "blah".
GHC.TypeLits provides the someSymbolVal function which looks somewhat
related but it produces a SomeSymbol, not a Symbol and I can quite grasp how to use
it.

Is there any way to improve this, preferably going in the opposite direction, i.e. from String to Symbol?
There is no way to go directly from String to Symbol, because Haskell isn't dependently typed, unfortunately. You do have to write out a type annotation every time you want a new value and there isn't an existing tag with the desired symbol already around.
The guarantee about the tag is only provided by tag not by the GADT.
The following should work well (in fact, the same type can be found in the singletons package):
data SSym :: Symbol -> * where
SSym :: KnownSymbol s => SSym s
-- defining values
sym1 = SSym :: SSym "foo"
sym2 = SSym :: SSym "bar"
This type essentially differs from Proxy only by having the KnownSymbol dictionary in the constructor. The dictionary lets us recover the string contained within even if the symbol is not known statically:
extractString :: SSym s -> String
extractString s#SSym = symbolVal s
We pattern matched on SSym, thereby bringing into scope the implicit KnownSymbol dictionary. The same doesn't work with a mere Proxy:
extractString' :: forall (s :: Symbol). Proxy s -> String
extractString' p#Proxy = symbolVal p
-- type error, we can't recover the string from anywhere
... it produces a SomeSymbol, not a Symbol and I can quite grasp how to use it.
SomeSymbol is like SSym except it hides the string it carries around so that it doesn't appear in the type. The string can be recovered by pattern matching on the constructor.
extractString'' :: SomeSymbol -> String
extractString'' (SomeSymbol proxy) = symbolVal proxy
It can be useful when you want to manipulate different symbols in bulk, for example you can put them in a list (which you can't do with different SSym-s, because their types differ).

Is there a nice(r) way of writing this Template Haskell code involving singleton data types?

I've just started to use Template Haskell (I've finally got a use case, yay!) and now I'm cognitively stuck.
What I'm trying to do is generating a singleton datatype declaration of the form
data $V = $V deriving (Eq,Ord)
starting from a name V (hopefully starting with an uppercase character!). To be explicit, I'm trying to write a function declareSingleton of type String -> DecsQ (I should mention here I'm using GHC 7.6.1, template-haskell version 2.8.0.0) such that the splice
$(declareSingleton "Foo")
is the equivalent of
data Foo = Foo deriving (Eq,Ord)
I've got the following code working and doing what I want, but I'm not very happy with it:
declareSingleton :: String -> Q [Dec]
declareSingleton s = let n = mkName s in sequence [
dataD (cxt []) n [] [normalC n []] [''Eq,''Ord]
]
I was hoping to get something like the following to work:
declareSingleton :: String -> Q [Dec]
declareSingleton s = let n = mkName s in
[d| data $n = $n deriving (Eq,Ord) |]
I've tried, to no avail (but not exhaustively!), various combinations of $s, $v, $(conT v), v, 'v so I have to suppose my mental model of how Template Haskell works is too simplistic.
Am I missing something obvious here, am I confusing type names and constructor names in some essential way, and can I write declareSingleton in a nice(r) way?
If so, how; if not, why not?
(Side remark: the Template Haskell API changes rapidly, and I'm happy for that - I want this simple type to eventually implement a multi-parameter type class with an associated type family - but the churn the API is currently going through doesn't make it easy to search for tutorials! There's a huge difference how TH was implemented in 6.12.1 or 7.2 (when most of the existing tutorial were written) versus how it works nowadays...)

From the Template Haskell documentation:
A splice can occur in place of
an expression; the spliced expression must have type Q Exp
an type; the spliced expression must have type Q Typ
a list of top-level declarations; the spliced expression must have type Q [Dec]
So e.g. constructor names simply cannot be spliced in the current version of Template Haskell.
I don't think there's much you can do to simplify this use-case (barring constructing the whole declaration as a string and transforming it into a Dec via toDec in haskell-src-meta).
You might consider simply binding the different parts of the declaration to local variables. While more verbose, it makes the code a bit easier to read.
declareSingleton :: String -> Q [Dec]
declareSingleton s = return [DataD context name vars cons derives] where
context = []
name = mkName s
vars = []
cons = [NormalC name fields]
fields = []
derives = [''Eq, ''Ord]

a hack that parses an interpolated string rather than constructing the AST:
makeU1 :: String -> Q [Dec]
makeU1 = makeSingletonDeclaration >>> parseDecs >>> fromRight >>> return
makeSingletonDeclaration :: String -> String
makeSingletonDeclaration name = [qq|data {name} = {name} deriving (Show)|]
where:
parseDecs :: String -> Either String [Dec]
usage:
makeU1 "T"
main = print T
it needs these packages:
$ cabal install haskell-src-exts
$ cabal install haskell-src-meta
$ cabal install interpolatedstring-perl6
runnable script:
https://github.com/sboosali/haskell/blob/0794eb7a52cf4c658270e49fa4f0d9e53c4b0ebf/TemplateHaskell/Splice.hs

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string