Using Uniplate in two-level tree type - haskell

I'm in the beginning stages of writing a parser for a C-like language in Haskell. I've got the AST data type down, and I'm playing around with it by writing some simple queries on the AST itself before I delve into the parser side of things.
My AST revolves around two types: statements (have no value, like an if/else) and expressions (have a value, like a literal or binary operation). So it looks something like this (vastly simplified, of course):
data Statement
= Return Expession
| If Expression Expression
data Expression
= Literal Int
| Variable String
| Binary Expression Op Expression
Say I want to get the names of all variables used in an expression. With uniplate, it's easy:
varsInExpression exp = concat [s | Variable s <- universe exp]
But what if I want to find a list of variables in a statement? In each constructor of Statement, there is a nested Expression that I should apply varsInExpression to. So at the moment, it looks like I'd have to pattern-match against every Statement constructor, which is what uniplate's out to avoid. Am I just not grokking the documentation well enough, or is this a limitation of uniplate (or am I doing it wrong?)?

This seems like a good use-case for biplates. I'm relying on the slower Data.Data method, but it makes this code pretty trivial.
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Data
import Data.Typeable
import Data.Generics.Uniplate.Data
data Statement
= Return Expression
| If Expression Expression
deriving(Data, Typeable)
data Expression
= Literal Int
| Variable String
| Binary Expression Int Expression
deriving(Data, Typeable)
vars :: Statement -> [String]
vars stmt = [ s | Variable s <- universeBi stmt]
Basically biplates are a generalized notion of uniplates where the target type isn't necessarily the same as the source, eg
biplate :: from -> (Str to, Str to -> from)

Related

Type design for the AST of my language remembering token locations

I wrote a parser and evaluator for a simple programming language. Here is a simplified version of the types for the AST:
data Value = IntV Int | FloatV Float | BoolV Bool
data Expr = IfE Value [Expr] | VarDefE String Value
type Program = [Expr]
I want error messages to tell the line and column of the source code in which the error occured. For example, if the value in an If expression is not a boolean, I want the evaluator to show an error saying "expected boolean at line x, column y", with x and y referring to the location of the value.
So, what I need to do is redefine the previous types so that they can store the relevant locations of different things. One option would be to add a location to each constructor for expressions, like so:
type Location = (Int, Int)
data Expr = IfE Value [Expr] Location | VarDef String Value Location
This clearly isn't optimal, because I have to add those Location fields to every possible expression, and if for example a value contained other values, I would need to add locations to that value too:
{-
this would turn into FunctionCall String [Value] [Location],
with one location for each value in the function call
-}
data Value = ... | FunctionCall String [Value]
I came up with another solution, which allows me to add locations to everything:
data Located a = Located Location a
type LocatedExpr = Located Expr
type LocatedValue = Located Value
data Value = IntV Int | FloatV Float | BoolV Bool | FunctionCall String [LocatedValue]
data Expr = IfE LocatedValue [LocatedExpr] | VarDef String LocatedValue
data Program = [LocatedExpr]
However I don't like this that much. First of all, it clutters the definition of the evaluator and pattern matching has an extra layer every time. Also, I don't think saying that a function call takes located values as arguments is quite right. Function calls should take values as arguments, and locations should be metadata that doesn't interfere with the evaluator.
I need help redefining my types so that the solution is as clean as possible. Maybe there is a language extension or a design pattern I don't know about that could be helpful.
There are many ways to annotate an AST! This is half of what’s known as the AST typing problem, the other half being how you manage an AST that changes over the course of compilation. The problem isn’t exactly “solved”: all of the solutions have tradeoffs, and which one to pick depends on your expected use cases. I’ll go over a few that you might like to investigate at the end.
Whichever method you choose for organising the actual data types, if it makes pattern-matching ugly or unwieldy, the natural solution is PatternSynonyms.
Considering your first example:
{-# Language PatternSynonyms #-}
type Location = (Int, Int)
data Expr
= LocatedIf Value [Expr] Location
| LocatedVarDef String Value Location
-- Unidirectional pattern synonyms which ignore the location:
pattern If :: Value -> [Expr] -> Expr
pattern If val exprs <- LocatedIf val exprs _loc
pattern VarDef :: String -> Value -> Expr
pattern VarDef name expr <- LocatedVarDef name expr _loc
-- Inform GHC that matching ‘If’ and ‘VarDef’ is just as good
-- as matching ‘LocatedIf’ and ‘LocatedVarDef’.
{-# Complete If, VarDef #-}
This may be sufficiently tidy for your purposes already. But here are a few more tips that I find helpful.
Put annotations first: when adding an annotation type to an AST directly, I often prefer to place it as the first parameter of each constructor, so that it can be conveniently partially applied.
data LocatedExpr
= LocatedIf Location Value [Expr]
| LocatedVarDef Location String Value
If the annotation is a location, then this also makes it more convenient to obtain when writing certain kinds of parsers, along the lines of AnnotatedIf <$> (getSourceLocation <* ifKeyword) <*> value <*> many expr in a parser combinator library.
Parameterise your annotations: I often make the annotation type into a type parameter, so that GHC can derive some useful classes for me:
{-# Language
DeriveFoldable,
DeriveFunctor,
DeriveTraversable #-}
data AnnotatedExpr a
= AnnotatedIf a Value [Expr]
| AnnotatedVarDef a String Value
deriving (Functor, Foldable, Traversable)
type LocatedExpr = AnnotatedExpr Location
-- Get the annotation of an expression.
-- (Total as long as every constructor is annotated.)
exprAnnotation :: AnnotatedExpr a -> a
exprAnnotation = head
-- Update annotations purely.
mapAnnotations
:: (a -> b)
-> AnnotatedExpr a -> AnnotatedExpr b
mapAnnotations = fmap
-- traverse, foldMap, &c.
If you want “doesn’t interfere”, use polymorphism: you can enforce that the evaluator can’t inspect the annotation type by being polymorphic over it. Pattern synonyms still let you match on these expressions conveniently:
pattern If :: Value -> [AnnotatedExpr a] -> AnnotatedExpr a
pattern If val exprs <- AnnotatedIf _anno val exprs
-- …
eval :: AnnotatedExpr a -> Value
eval expr = case expr of
If val exprs -> -- …
VarDef name expr -> -- …
Unannotated terms aren’t your enemy: a term without source locations is no good for error reporting, but I think it’s still a good idea to make the pattern synonyms bidirectional for the convenience of constructing unannotated terms with a unit () annotation. (Or something equivalent, if you use e.g. Maybe Location as the annotation type.)
The reason is that this is quite convenient for writing unit tests, where you want to check the output, but want to use Eq instead of pattern matching, and don’t want to have to compare all the source locations in tests that aren’t concerned with them. Using the derived classes, void :: (Functor f) => f a -> f () strips out all the annotations on an AST.
import Control.Monad (void)
type BareExpr = AnnotatedExpr ()
-- One way to define bidirectional synonyms, so e.g.
-- ‘If’ can be used as either a pattern or a constructor.
pattern If :: Value -> [BareExpr] -> BareExpr
pattern If val exprs = AnnotatedIf () val exprs
-- …
stripAnnotations :: AnnotatedExpr a -> BareExpr
stripAnnotations = void
Equivalently, you could use GADTs / ExistentialQuantification to say data AnyExpr where { AnyExpr :: AnnotatedExpr a -> AnyExpr } / data AnyExpr = forall a. AnyExpr (AnnotatedExpr a); that way, the annotations have exactly as much information as (), but you don’t need to fmap over the entire tree with void in order to strip it, just apply the AnyExpr constructor to hide the type.
Finally, here are some brief introductions to a few AST typing solutions.
Annotate each AST node with a tag (e.g. a unique ID), then store all metadata like source locations, types, and whatever else, separately from the AST:
import Data.IntMap (IntMap)
-- More sophisticated/stronglier-typed tags are possible.
newtype Tag = Tag Int
newtype TagMap a = TagMap (IntMap a)
data Expr
= If !Tag Value [Expr]
| VarDef !Tag String Expr
type Span = (Location, Location)
type SourceMap = TagMap Span
type CommentMap = TagMap (Span, String)
parse
:: String -- Input
-> Either ParseError
( Expr -- Parsed expression
, SourceMap -- Source locations of tags
, CommentMap -- Sideband for comments
-- …
)
The advantage is that you can very easily mix in arbitrary new types of annotations anywhere, without affecting the AST itself, and avoid rewriting the AST just to change annotations. You can think of the tree and annotation tables as a kind of database, where the tags are the “foreign keys” relating them. A downside is that you must be careful to maintain these tags when you do rewrite the AST.
I don’t know if this approach has an established name; I think of it as just “tagging” or a “tagged AST”.
recursion-schemes and/or Data Types à la CartePDF: separate out the “recursive” part of an annotated expression tree from the “annotation” part, and use Fix to tie them back together, with Compose (or Cofree) to add annotations in the middle.
data ExprF e
= IfF Value [e]
| VarDefF String e
-- …
deriving (Foldable, Functor, Traversable, …)
-- Unannotated: Expr ~ ExprF (ExprF (ExprF (…)))
type Expr = Fix ExprF
-- With a location at each recursive step:
--
-- LocatedExpr ~ Located (ExprF (Located (ExprF (…))))
type LocatedExpr = Fix (Compose Located ExprF)
data Located a = Located Location a
deriving (Foldable, Functor, Traversable, …)
-- or: type Located = (,) Location
A distinct advantage is that you get a bunch of nice traversal stuff like cata for free-ish, so you can avoid having to write manual traversals over your AST over and over. A downside is that it adds some pattern clutter to clean up, as does the “à la carte” approach, but they do offer a lot of flexibility.
Trees That GrowPDF is overkill for just source locations, but in a serious compiler it’s quite helpful. If you expect to have more than one annotation type (such as inferred types or other analysis results) or an AST that changes over time, then you add a type parameter for the “compilation phase” (parsed, renamed, typechecked, desugared, &c.) and select field types or enable & disable constructors based on that index.
A really unfortunate downside of this is that you often have to rewrite the tree even in places nothing has changed, because everything depends on the “phase”. An alternative that I use is to add one type parameter for each type of phase or annotation that can vary independently, e.g. data Expr annotation termVarName typeVarName, and abstract over that with type and pattern synonyms. This lets you update indices independently and still use classes like Functor and Bitraversable.

Haskell AST with recursive types

I'm currently trying to build a lambda calculus solver, and I'm having a slight problem with constructing the AST. A lambda calculus term is inductively defined as:
1) A variable
2) A lambda, a variable, a dot, and a lambda expression.
3) A bracket, a lambda expression, a lambda expression and a bracket.
What I would like to do (and at first tried) is this:
data Expr =
Variable
| Abstract Variable Expr
| Application Expr Expr
Now obviously this doesn't work, since Variable is not a type, and Abstract Variable Expr expects types. So my hacky solution to this is to have:
type Variable = String
data Expr =
Atomic Variable
| Abstract Variable Expr
| Application Expr Expr
Now this is really annoying since I don't like the Atomic Variable on its own, but Abstract taking a string rather than an expr. Is there any way I can make this more elegant, and do it like the first solution?
Your first solution is just an erroneous definition without meaning. Variable is not a type there, it's a nullary value constructor. You can't refer to Variable in a type definition much like you can't refer to any value, like True, False or 100.
The second solution is in fact the direct translation of something we could write in BNF:
var ::= <string>
term ::= λ <var>. <term> | <term> <term> | <var>
And thus there is nothing wrong with it.
What you exactly want is to have some type like
data Expr
= Atomic Variable
| Abstract Expr Expr
| Application Expr Expr
But constrain first Expr in Abstract constructor to be only Atomic. There is no straightforward way to do this in Haskell because value of some type can be created by any constructor of this type. So the only approach is to make some separate data type or type alias for existing type (like your Variable type alias) and move all common logic into it. Your solution with Variable seems very ok to me.
But. You can use some other advanced features in Haskell to achieve you goal in different way. You can be inspired by glambda package which uses GADT to create typed lambda calculus. Also see this answer: https://stackoverflow.com/a/39931015/2900502
I can come up with next solution to achieve you minimal goals (if you only want to constrain first argument of Abstract):
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
data AtomicT
data AbstractT
data ApplicationT
data Expr :: * -> * where
Atomic :: String -> Expr AtomicT
Abstract :: Expr AtomicT -> Expr a -> Expr AbstractT
Application :: Expr a -> Expr b -> Expr ApplicationT
And next example works fine:
ex1 :: Expr AbstractT
ex1 = Abstract (Atomic "x") (Atomic "x")
But this example won't compile because of type mismatch:
ex2 :: Expr AbstractT
ex2 = Abstract ex1 ex1

How can I easily express that I don't care about a value of a particular data field?

I was writing tests for my parser, using a method which might not be the best, but has been working for me so far. The tests assumed perfectly defined AST representation for every code block, like so:
(parse "x = 5") `shouldBe` (Block [Assignment [LVar "x"] [Number 5.0]])
However, when I moved to more complex cases, a need for more "fuzzy" verification arised:
(parse "t.x = 5") `shouldBe` (Block [Assignment [LFieldRef (Var "t") (StringLiteral undefined "x")] [Number 5.0]])
I put in undefined in this example to showcase the field I don't want to be compared to the result of parse (It's a source position of a string literal). Right now the only way of fixing that I see is rewriting the code to make use of shouldSatisfy instead of shouldBe, which I'll have to do if I don't find any other solution.
You can write a normalizePosition function which replaces all the position data in your AST with some fixed dummyPosition value, and then use shouldBe against a pattern built from the same dummy value.
If the AST is very involved, consider writing this normalization using Scrap-Your-Boilerplate.
One way to solve this is to parametrize your AST over source locations:
{-# LANGUAGE DeriveFunctor #-}
data AST a = ...
deriving (Eq, Show, Functor)
Your parse function would then return an AST with SourceLocations:
parse :: String -> AST SourceLocation
As we derived a Functor instance above, we can easily replace source locations with something else, e.g. ():
import Data.Functor ((<$))
parseTest :: String -> AST ()
parseTest input = () <$ parse input
Now, just use parseTest instead of parse in your specs.

How to include code in different places during compilations in Haskell?

Quasi-quotes allow generating AST code during compilations, but it inserts generated code at the place where Quasi-quote was written. Is it possible in any way to insert the compile-time generated code elsewhere? For example in specific module files which are different from the one where QQ was written? It would depend on hard-coded module structure, but that's fine.
If that's not possible with QQ but anyone knows a different way of achieving it, I am open for suggestions.
To answer this, it's helpful to know what a quasi-quoter is. From the GHC Documentation, a quasi-quoter is a value of
data QuasiQuoter = QuasiQuoter { quoteExp :: String -> Q Exp,
quotePat :: String -> Q Pat,
quoteType :: String -> Q Type,
quoteDec :: String -> Q [Dec] }
That is, it's a parser from an arbitrary String to one or more of ExpQ, PatQ, TypeQ, and DecQ, which are Template Haskell representations of expressions, patterns, types, and declarations respectively.
When you use a quasi-quote, GHC applies the parser to the String to create a ExpQ (or other type), then splices in the resulting template haskell expression to produce an actual value.
It sounds like what you're asking to do is separate the quasiquote parsing and splicing, so that you have access to the TH expression. Then you can import that expression into another module and splice it there yourself.
Knowing the type of a quasi-quoter, it's readily apparent this is possible. Normally you use a QQ as
-- file Expr.hs
eval :: Expr -> Integer
expr = QuasiQuoter { quoteExp = parseExprExp, quotePat = parseExprPat }
-- file Foo.hs
import Expr
myInt = eval [expr|1 + 2|]
Instead, you can extract the parser yourself, get a TH expression, and splice it later:
-- file Foo.hs
import Expr
-- run the QQ parser
myInt_TH :: ExpQ
myInt_TH = quoteExp expr "1 + 2"
-- file Bar.hs
import Foo.hs
-- run the TH splice
myInt = $(myInt_TH)
Of course if you're writing all this yourself, you can skip the quasi-quotes and use a parser and Template Haskell directly. It's pretty much the same thing either way.

looking for a way to add reserved words in data definitions in haskell

As the title says im trying to build an interpreter for Imperative languages in Haskell. Iv done 90% of it, nevertheless, im trying to build If statements, and my question is, how will i define a new datatype lets say :
data x = if boolExp then exp else exp.
I understand that i could re-write this with something like doIf boolExp exp exp. But i would like to see if i could use those reserved keywords just for fun (and maybe conciseness).
Note that both boolExp and exp are defined in my language and work correctly( i even evaluate them to get the actual expression "value").
So bottom line is, how will i add reserved keywords in my data definition as required above?
The closest you can get is the RebindableSyntax extension in GHC, which treats if cond then truePart else falsePart as if you had written ifThenElse cond truePart falsePart, with whatever ifThenElse is in scope.
You can't use this for a data constructor, but you can write something like this:
{-# LANGUAGE RebindableSyntax #-}
import Prelude
data BoolExp = Foo
deriving Show
data Exp = If BoolExp Exp Exp | Bar
deriving Show
ifThenElse :: BoolExp -> Exp -> Exp -> Exp
ifThenElse = If
example = if Foo then Bar else Bar
main = print example
Running this prints If Foo Bar Bar.
However, unless you're writing some kind of internal DSL where this would make sense, I strongly recommend just sticking with the regular syntax like If cond truePart falsePart. Writing it this way has no real benefit and serves only to confuse people when reading your code.
Here's one dumb way to get the kind of syntax you want:
data If a = If Bool () a () a
if_ = If
then_ = ()
else_ = ()
expr :: If Int
expr = if_ (3 > 4) then_ 0 else_ 1
But like hammar said, if you want to actually make use of the keywords if, then, and else, you'll need to use RebindableSyntax, or some other preprocessor/macro, otherwise you'll be in conflict with Haskell syntax.
Besides the suggestions already given, you can also use the quasiquote feature of Template Haskell to create whatever syntax you want.

Resources