How can I remove all the boilerplate introduced by Trees That Grow?

How can I remove all the boilerplate introduced by Trees That Grow? - haskell

I'm trying to define a programming language in Haskell. I wish to make the AST extensible: users of the AST module (for instance a pretty printer, an interpreter, a compiler, a type system, a language server and so on) should be able to extend it, by adding both new functionalities and new data (new datatypes to extend the syntax as well as new fields to the current data constructors to store data needed by the various components).
I tried to achieve this goal by using Trees That Grow (TTG). It works, but it results in way too much boilerplate. My minimal prototype becomes 10 times larger in terms of lines of code, and this number grows by the AST size times the number of extensions. Changing something minor requires changing several lines of the AST module, while changing something in the way extensibility is implemented would require rewriting most of it.
Is there any way to reduce the amount of boilerplate needed, or ideally remove it altogether?
Example with code of what I have so far
The "base" AST
This is just one small piece of the AST. It's something very similar to JSON, as I decided to start with a small prototype.
module AST ( KeyValue(..), Data(..) ) where
data KeyValue = KV String Data deriving (Show, Eq, Ord)
data Data =
Null |
Int Int |
Num Double |
Bool Bool |
String String |
Array [Data] |
Object [KeyValue] deriving (Show, Eq, Ord)
The extensible AST, via Trees That Grow
In order to extend it via TTG, the datatypes become something like this:
data KeyValueX x =
KVX (XKV x) String (DataX x) |
KeyValueX (XKeyValue x)
data DataX x =
NullX (XNull x) |
IntX (XInt x) Int |
NumX (XNum x) Double |
BoolX (XBool x) Bool |
StringX (XString x) String |
ArrayX (XArray x) [DataX x] |
ObjectX (XObject x) [KeyValueX x] |
DataX (XData x)
Each of those types with a name starting in X is a type family:
type family XKV x
type family XKeyValue x
type family XNull x
type family XInt x
type family XNum x
type family XBool x
type family XString x
type family XArray x
type family XObject x
type family XData x
Further each of them requires to be listed in a type that makes it easier to derive classes:
type ForallX (c :: Type -> Constraint) x = (
c (XKV x), c (XKeyValue x),
c (XNull x), c (XInt x), c (XNum x), c (XBool x),
c (XString x), c (XArray x), c (XObject x), c (XData x)
)
-- now we can do:
deriving instance ForallX Show x => Show (KeyValueX x)
deriving instance ForallX Show x => Show (DataX x)
deriving instance ForallX Eq x => Eq (KeyValueX x)
deriving instance ForallX Eq x => Eq (DataX x)
deriving instance ForallX Ord x => Ord (KeyValueX x)
deriving instance ForallX Ord x => Ord (DataX x)
And of course everything requires to be exported:
module AST ( KeyValueX(..), DataX(..),
XKV, XKeyValue,
XNull, XNum, XBool, XString, XArray, XObject, XData,
ForallX
) where
An extension to the AST
This is what is needed in order to create an extension. Even just the "identity" extension (UnDecorated) which needs to be provided.
For every instance you need to implement a typeclass for the type family of every type and data constructor:
data UD -- UnDecorated, identity extension
type instance XKV UD = ()
type instance XKeyValue UD = Void
type instance XData UD = Void
type instance XNull UD = ()
type instance XInt UD = ()
type instance XNum UD = ()
type instance XBool UD = ()
type instance XString UD = ()
type instance XArray UD = ()
type instance XObject UD = ()
Then, in order to do things properly and ergonomic enough for the user, you need patterns and type aliases for every data constructor and data type:
type KeyValue = KeyValueX UD
pattern KV :: String -> Data -> KeyValue
pattern KV x y <- KVX _ x y where KV x y = KVX () x y
type Data = DataX UD
pattern Null :: Data
pattern Null <- NullX _ where Null = NullX ()
pattern DInt :: Int -> Data
pattern DInt x <- IntX _ x where DInt x = IntX () x
pattern DNum :: Double -> Data
pattern DNum x <- NumX _ x where DNum x = NumX () x
pattern DBool :: Bool -> Data
pattern DBool x <- BoolX _ x where DBool x = BoolX () x
pattern DString :: String -> Data
pattern DString x <- StringX _ x where DString x = StringX () x
pattern Array :: [Data] -> Data
pattern Array x <- ArrayX _ x where Array x = ArrayX () x
pattern Object :: [KeyValue] -> Data
pattern Object x <- ObjectX _ x where Object x = ObjectX () x
And of course all this stuff should be exported too:
module AST ( ...,
UD,
KeyValue, Data,
pattern KV,
pattern Null, pattern Num, pattern Bool,
pattern String, pattern Array, pattern Object
) where
Summary
TTG turned my simple 10-line module, into a module of more than 100 lines where 90% of the code is boring, hard-to-maintain boilerplate:
The original (unextensible) AST module took around 10 lines.
The AST for the extensible version ended up taking about 50 lines and each of the data constructors (including their related type families) is mentioned around 4 times.
On top of that, every AST extension (including the required "identity" one) takes another 50 lines and mentions each of the data constructor another 3 times.
I would estimate that the whole language could take a couple dozen types with a total of more than a hundred of data constructors. Then I would need to define a handful of extensions to the AST. A non-extensible AST would take around 100 lines (as an order of magnitude), while one extended via TTG would take around 10,000. All the required boilerplate would make all of this unmanageable for me.
Question
Is there any way to reduce the amount of boilerplate needed, or ideally remove it altogether?
Otherwise are there any alternative ways to make my AST extensible without requiring this much work?

You can merge all of the type families into one indexed by a symbol:
data KeyValueX x =
KVX (X "KVX" x) String (DataX x) |
KeyValueX (X "KeyValueX" x)
deriving Generic
data DataX x =
NullX (X "NullX" x) |
IntX (X "IntX" x) Int |
NumX (X "NumX" x) Double |
BoolX (X "BoolX" x) Bool |
StringX (X "StringX" x) String |
ArrayX (X "ArrayX" x) [DataX x] |
ObjectX (X "ObjectX" x) [KeyValueX x] |
DataX (X "DataX" x)
deriving Generic
--
type family X (s :: k) (x :: l) :: Type
Use generics to grab all of the constructor names:
type ForAllX c x = (AllX c (CNames (DataX x)) x, AllX c (CNames (KeyValueX x)) x)
deriving instance ForAllX Eq x => Eq (DataX x)
deriving instance ForAllX Eq x => Eq (KeyValueX x)
-- CNames defined using generics, below
All of the boilerplate up to that point could also be generated from the "base AST" using Template Haskell.
Having only one type family makes it easy to define extensions with catch-all clauses:
data UD
type instance X s UD = XUD s
type family XUD (s :: Symbol) :: Type where
XUD "KeyValueX" = Void
XUD "DataX" = Void
XUD _ = ()
As for the patterns, maybe just exposing the constructors is not so bad? GHC does that.
Imports and generics code to make this answer self-contained:
{-# LANGUAGE
DataKinds,
DeriveGeneric,
PolyKinds,
StandaloneDeriving,
TypeFamilies,
UndecidableInstances #-}
module T where
import Data.Kind (Constraint, Type)
import Data.Void
import GHC.Generics
import GHC.TypeLits
type CNames a = GCNames (Rep a)
type family GCNames (f :: Type -> Type) :: [Symbol] where
GCNames (M1 D c f) = GCNames f
GCNames (f :+: g) = GCNames f ++ GCNames g
GCNames (M1 C (MetaCons name _ _) f) = '[name]
type family (xs :: [k]) ++ (ys :: [k]) :: [k] where
'[] ++ ys = ys
(x ': xs) ++ ys = x ': (xs ++ ys)
type family AllX (c :: Type -> Constraint) (xs :: [Symbol]) (x :: l) :: Constraint where
AllX c '[] x = ()
AllX c (s ': ss) x = (c (X s x), AllX c ss x)
Gist: https://gist.github.com/Lysxia/3f6781b3a307a7e0c564920d6277bee2

Related

Implement Ordering via hashing

I have a relatively large set of algebraic data types where I can't automatically derive Eq and Ord because a single field in the data type is considered metadata and shouldn't be considered in equality and ordering. For example a data type might look like this:
data Foo = A Int | B String | C String Int | ... | Z String String Int
Where every Int in this case is metadata.
So what I do is manually implement Eq by just comparing constructors. But for Ord this becomes insanity because if I have n constructors I have to implement n^2 compare functions. So currently my work around is to manually implement Hashable which requires me to implement a single hash function for every constructor. And then just do a hash compare in my Ord instance.
This has some problems obviously since compare (hash x) (hash y) == EQ -> x == y doesn't hold since two different values can share the same hash. However this can be handled by first manually checking for equality and if this is the case always say the left hand side is smaller then right hand side.
However now you have that for some values of any type it holds that a < b && b < a. Which I'm not sure is allowed in the Haskell Ord instance. So basically my question is if it is Oke to implement Ord like this or not? The reason I need Ord is because many libraries require Ord. For instance graph libraries and map libraries.
Here is a full example:
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE ViewPatterns #-}
module Test where
import Prelude
import Data.Bits (xor)
import Data.Hashable (Hashable (..))
data Foo = A Int | B String | C String Int | Z String String Int
instance Eq Foo where
(A _) == (A _) = True
(B x1) == (B x2) = x1 == x2
(C x1 _) == (C x2 _) = x1 == x2
(Z x1 y1 _) == (Z x2 y2 _) = x1 == x2 && y1 == y2
_ == _ = False
instance Hashable Foo where
hashWithSalt s (A _) = s `xor` (hash #Int 1)
hashWithSalt s (B x) = s `xor` (hash #Int 2) `xor` (hash x)
hashWithSalt s (C x _) = s `xor` (hash #Int 3) `xor` (hash x)
hashWithSalt s (Z x y _) = s `xor` (hash #Int 4) `xor` (hash x) `xor` (hash y)
instance Ord Foo where
compare (hash -> a) (hash -> b) = case compare a b of
EQ -> if a == b then EQ else LT
e -> e

Well, this turned out to be a little more complicated than I expected when I actually wrote it all up, so maybe someone can come up with something simpler, but...
If you have freedom to modify your types, I would suggest making your type polymorphic in the offending integer type and deriving a functor:
{-# LANGUAGE DeriveFunctor #-}
data FooF int = A int | B String | C String int | Z String String int deriving (Functor)
Now, your original type is given by the alias:
type Foo = FooF Int
You can use a standalone deriving clause to derive Eq and Ord for FooF ():
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}
deriving instance Eq (FooF ())
deriving instance Ord (FooF ())
and then with a conversion function that forgets the integers:
forgetInts :: Foo -> FooF ()
forgetInts x = () <$ x
you can write Foo instances as follows:
import Data.Function
instance Eq Foo where
(==) = (==) `on` forgetInts
instance Ord Foo where
compare = compare `on` forgetInts
One drawback is that you might need some additional type signatures or annotations, since A 10 is no longer unambiguously FooF Int as opposed to FooF Double. See main below, for example.
Full code:
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}
import Data.Function
data FooF int = A int | B String | C String int | Z String String int deriving (Functor)
type Foo = FooF Int
deriving instance Eq (FooF ())
deriving instance Ord (FooF ())
forgetInts :: Foo -> FooF ()
forgetInts x = () <$ x
instance Eq Foo where
(==) = (==) `on` forgetInts
instance Ord Foo where
compare = compare `on` forgetInts
main = do
print $ Z "foo" "bar" 1 == (Z "foo" "bar" 2 :: Foo)
print $ compare (A 10) (A 20 :: Foo)

Here's a hashless solution that may work even if you have multiple metadata types (where the Functor answer I posted separately doesn't work). If you have the flexibility to wrap your metadata in a newtype, you can use Eq and Ord instances for the newtype to "shield" the metadata from the derived Eq and Ord:
-- Meta data is always equal
newtype Meta a = Meta a
instance Eq (Meta a) where
x == y = True
x /= y = False
instance Ord (Meta a) where
compare x y = EQ
Then, a type like:
data Foo = A (Meta Int) | B String | C String (Meta Bool)
| Z String String (Meta String) deriving (Eq, Ord)
with derived Eq and Ord instances compares as if the metadata isn't there:
main = do
print $ Z "foo" "bar" (Meta "different") == Z "foo" "bar" (Meta "but still the same")
print $ compare (A (Meta 10)) (A (Meta 20))
Here, the drawback is the usual issue with newtype wrappers: you need to wrap and unwrap (or coerce) metadata.
Full code:
newtype Meta a = Meta a
instance Eq (Meta a) where
x == y = True
x /= y = False
instance Ord (Meta a) where
compare x y = EQ
data Foo = A (Meta Int) | B String | C String (Meta Bool)
| Z String String (Meta String) deriving (Eq, Ord)
main = do
print $ Z "foo" "bar" (Meta "different") == Z "foo" "bar" (Meta "but still the same")
print $ compare (A (Meta 10)) (A (Meta 20))

Type-level constraints in instances of type families

Is it possible to have type synonym families for parametrized data such as Data.Param.FSVec?
Ideally, I would like this to compile:
class A e where
type Arg e a
f :: (Arg e a -> b) -> e a -> e b
instance A X where
type Arg X a = Nat size => FSVec size a
f = {- implementation -}
I have tried several workarounds, like wrapping FSVec size a in a newtype, or constraint synonyms, but it seems that I could not get anything reasonable right.
Context + minimal working example
A is a class previously defined (for example) as such:
class OldA e where
f :: (Maybe a -> b) -> [e (Maybe a)] -> [e b]
An example of type inheriting OldA is:
data Y a = Y a
instance Functor Y where
fmap f (Y a) = Y (f a)
instance OldA Y where
f = fmap . fmap
I want to extend this class to be able to express more general function arguments for f. Let's say we have a type X and an associated function fIndependent:
import qualified Data.Param.FSVec as V
import Data.TypeLevel hiding ((==))
data X a = X a deriving Show
fromX (X a) = a
fIndependent :: (Nat size) => (V.FSVec size (Maybe a) -> b) -> [X (Maybe a)] -> [X b]
fIndependent _ [] = []
fIndependent f xs = let x' = (V.reallyUnsafeVector . take c . fmap fromX) xs
xs' = drop c xs
c = V.length x'
in if c == length (V.fromVector x') then X (f x') : fIndependent f xs' else []
fIndependent is sane itself. Testing it with a function
test :: V.FSVec D2 x -> Int
test a = V.length a
will grant the result:
>>> fIndependent test $ map (X . Just) [1,2,3,4,5,6,7,8,9]
[X 2, X 2, X 2, X 2]
Ok, now how to extend OldA? The most "natural" thing that came into my mind is to equip class A with a type synonym family Arg e a as below.
class NewA e where
type Arg e a
f :: (Arg e a -> b) -> [e (Maybe a)] -> [e b]
Converting all existing instances is easy:
instance NewA Y where
type Arg Y a = Maybe a
f = fmap . fmap -- old implementation
To express fIndependent as f is the difficult part, since just adding
instance NewA X where
type Arg X a = (Nat size) => FSVec size (Maybe a) -- wrong!!!
f = {- same as fIndependent -}
does not work. This is what I have trouble with.
Try-outs
Most solutions I saw propose wrapping FSVec inside a newtype. Doing so does not help since the following code:
{-# LANGUAGE RankNTypes #-}
newtype ArgV a = ArgV (forall rate.Nat rate => V.FSVec rate (Maybe a))
instance NewA X where
type Arg X a = ArgV a
g f xs = let x' = (V.reallyUnsafeVector . take c . fmap fromX) xs
xs' = drop c xs
c = V.length x'
in if c == length (V.fromVector x') then X (f $ ArgV x') : g f xs' else []
the type inference system seems to lose the information about size:
Couldn't match type ‘s0’ with ‘rate’ …
because type variable ‘rate’ would escape its scope
This (rigid, skolem) type variable is bound by
a type expected by the context: Nat rate => V.FSVec rate (Maybe a)
Expected type: V.FSVec rate (Maybe a)
Actual type: V.FSVec s0 (Maybe a)
Relevant bindings include
x' :: V.FSVec s0 (Maybe a)
(bound at ...)
In the first argument of ‘Args’, namely ‘x'’
In the second argument of ‘($)’, namely ‘Args x'’
Compilation failed.
I would appreciate any lead or hint in this matter.

It appears that you are using a class Nat :: k -> Constraint and a data type FSVec :: k -> * -> *. The data type is constrained with the old DatatypeContexts extension.
{-# LANGUAGE DatatypeContexts #-}
class Nat n
data Nat n => FSVec n a = FSVec -- ...
You have an existing class A :: (* -> *) -> Constraint which you'd like to write an FSVec instance for.
class A e where
--- ...
f :: ( {- ... -} b) -> e a -> e b
But FSVec can never have an A instance, because it's a kind mismatch. The class A requires a type argument with the kind * -> * but FSVec has the kind k -> * -> *. You've already run into a problem, and aren't even using the type family yet. If you try to do this (hand waving away what the type family argument is for now)
data X = X
instance A (FSVec) where
type Arg FSVec a = X
f = undefined
You get a compiler error.
Expecting one more argument to `FSVec'
The first argument of `A' should have kind `* -> *',
but `FSVec' has kind `* -> * -> *'
In the instance declaration for `A (FSVec)'
Everything before here, including the compiler error, is useful information for communicating the problem you are having and is useful in asking for help.
Fortunately it's a really easy problem to fix. If you pick some natural number n, then FSVec n has the kind * -> *, which matches the kind of the type argument to A. You can start writing an instance A (FSVec n)
instance A (FSVec n) where
f = -- ...
When you reintroduce the complete class definition with type families
{-# LANGUAGE TypeFamilies #-}
class A e where
type Arg e a
f :: (Arg e a -> b) -> e a -> e b
The solution is still to write an A instance for FSVec n instead of for FSVec. Now that n has moved into the instance declaration, there's an obvious place to capture the needed Nat n context.
instance Nat n => A (FSVec n) where
type Arg (FSVec n) a = FSVec n a
f = undefined -- ...

Cirdec's answer explains one of the problems, but its solution given does not exactly answer the question posted. The question asks for an instance X for the class A, with a FSVec type synonym.
The overarching issue here that prevents defining type Arg X = FSVec size a (in any possible configuration) is that type families are not injective. Knowing this and following Cirdec's reasoning, I can think of a workaround to achieve this goal: include a proxy "context" variable in Xs type, to overcome the mentioned issue.
data X c a = X a
instance (Nat n) => A (X n) where
type (X n) a = FSVec n a
f = {- same as fIndependent -}
Of course, this is a quick fix that works for the minimal example (i.e. it answers the question posted), but might not scale well when composing multiple functions like f since there might appear type clashes between the inferred "contexts".
The best solution I can think of would be to add a constraint synonym (as suggested by this answer) for each instance, like:
import qualified Data.Param.FSVec
import Data.TypeLevel
import GHC.Exts -- for Constraint kind
class A e where
type Arg e context a
type Ctx e context :: Constraint
f :: (Ctx e context) => (Arg e context a -> b) -> [e (Maybe a)] -> [e b]
instance A Y where
type Arg Y c a = Maybe a
type Ctx Y c = ()
f = {- same as before -}
instance A X where
type Arg X size a = V.FSVec size (Maybe a)
type Ctx X size = Nat rate
f = {- same as fIndependent -}
But then we would have to deal with the ambiguous types resulted due to the infamous non-injectivity of type families (e.g. Could not deduce: Arg e context0 a ~ Arg e context a). In this case proving injectivity would have to be done manually using the TypeFamilyDependencies extension (based on injective type families) available in GHC 8.0, and define Arg as:
type family Arg (e :: * -> *) context = (r :: * -> *) | r -> context
Of course, this is not possible if the design of the type family is not injective (which is my case), but it is the cleanest solution so far. It is definitely recommended if one can design her type family using the guidelines in the provided paper.

Is there a name for ADT with explicit subtyping?

I'm looking for a proper name for a data type that combines ADT with explicit subtyping.
In one of my applications, I use a structure similar to ADT to represent parse trees, on which I perform recursive pattern matching. I find it rather convenient if I could combine ADT with subtyping, as demonstrated in the example below:
Note: the example is written in Haskell's syntax, but this is not Haskell code.
data Empty = Empty
data Expr = Int Int | Add Expr AddOp Expr
data OptionalExpr =
| Empty // I want to make Empty a subtype of OptionalExpr
| Expr // I want to make Expr a subtype of OptionalExpr
In the example above, I first define 2 types: Empty and Expr. Then I make these 2 types the subtype of OptionalExpr. I realize this kind of data type is uncommon. Apparently neither Haskell nor OCaml support it. But I don't know about other functional languages.
I am looking for something that combines ADT with explicit subtyping, not structurally-implied subtyping as in polymorphic variant. There are a few justifications for this requirement:
First, we want all-or-none subtyping. Say we want A to be a subtype of B, then we will never want to include only some of the variants of A under B. Either A is a subtype of B, in which case B includes all the variants of A, or A is not a subtype of B, in which case B includes none of the variants of A. We don't allow gray-area in between.
Second, we don't want B to be open in any sense. We have in mind a very specific set of subtypes of B. We don't want something to become an instance of B just by implementing a typeclass or the like.
Third, say type A has a large number of variants. We want to make type B a supertype of A. Copying all the variants into B, as is required with polymorphic variant, is just too cumbersome and error-prone.
Fourth, we don't want to introduce new value-constructors when all we want to express is a subtype. In the example above, we could have written OptionalExpr as an ADT with 2 value-constructors, like this: data OptionalExpr = EmptyExpr | NonEmptyExpr Expr, or we could have used Maybe, but in my application this is unacceptable, because the level of embedding can be quite deep, and it would be a nightmare to deconstruct an deeply embedded value like (L1 (L2 (L3 (L4 (L5 value_wanted))))).
To give you some idea why such requirements exist, I show a more specific example below:
PrimaryExpr = ID | LeftParen Expr RightParen
UnaryExpr = PrimaryExpr | NegateOp PrimaryExpr // -
MultExpr = UnaryExpr | MultExpr MultOp UnaryExpr // *
AddExpr = MultExpr | AddExpr AddOp MultExpr // +
CompExpr = AddExpr | AddExpr CompOp AddExpr
Expr = CompExpr
The above example expresses a subtype hierarchy, and expresses ideas such as AddExpr is a CompExpr, but a CompExpr is not an AddExpr. For this specific example, some people have suggested to me that I can replace UnaryExpr, MultExpr, AddExpr and so on with just Expr. That is, I can define all the types as a single type. That loses type constraints such as CompExpr is not AddExpr, and because I'm doing recursive pattern matching on these types, I need that constraints of this hierarchy to be statically enforced.
Is there a name for this kind of data type I'm looking for in the literature? Or am I looking for something that doesn't even make sense? If you think this is the case, why am I looking for something nonsensical? Thanks for any pointers.
EDIT: even though I've written the above code snippets in Haskell's syntax, I am not writing my application in Haskell. I'm using my own language and my own data types, so I am not limited by Haskell's semantics. I am looking for a pointer to similar concepts in literature, so that when I write a report for my project I don't appear to be reinventing something new. I tried all the google keywords I can think of and nothing right was returned so I'm asking here.

In a comment, you say:
I'm not sure how to encode a subtype hierarchy using GADTs. If you think it is doable, would you mind providing an answer with an example as to how the type hierarchy given in my example may be encoded?
Therefore I give an answer to this question here. The key idea is to give a type-level function (in the host language, here Haskell) for computing the subtyping relation (of the target language's type system, here your custom EDSL). For simplicity, I will spell out the subtyping relation in full, but standard type-level programming can be used to reduce the repetition and raise the abstraction level as appropriate. First, the extensions needed:
{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
Now the definition of the subtyping relation:
data Level = Primary | Unary | Mult | Add | Comp
type family Subtype a b where
Subtype Primary a = True
Subtype Unary Primary = False
Subtype Unary a = True
Subtype Mult Primary = False
Subtype Mult Unary = False
Subtype Mult a = True
Subtype Add Add = True
Subtype Add Comp = True
Subtype Add a = False
Subtype Comp Comp = True
Subtype Comp a = False
A closed type family is used to guarantee that the subtyping relation cannot be expanded by clients (your second property). Finally, the GADT for target language terms can use the subtyping relation as a constraint on its constructors.
data Expr a where
ID :: Subtype Primary a ~ True => Expr a
Paren :: Subtype Primary a ~ True => Expr b -> Expr a
Negate :: Subtype Unary a ~ True => Expr Unary -> Expr a
Times :: Subtype Add a ~ True => Expr Mult -> Expr Mult -> Expr a
Plus :: Subtype Add a ~ True => Expr Add -> Expr Add -> Expr a
Compose :: Subtype Comp a ~ True => Expr Comp -> Expr Comp -> Expr a
Note that because the argument to Paren is polymorphic, you will need a type annotation on the contained term to express which "level" of the subtyping hierarchy you want that term to be treated as. I would expect you would need to do this in whatever language you are designing as well. In ghci, we can ask for the type of a sample term:
:t Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
:: (Subtype 'Comp a ~ 'True) => Expr a
This is more or less the type you would expect for this term, I think. You can also see that the expression hierarchy is strictly enforced, though I dare say the error message is not 100% clear (since it is written in host language terms and not target language terms):
:t Negate (Plus ID ID)
<interactive>:1:9:
Couldn't match type ‘'False’ with ‘'True’
Expected type: 'True
Actual type: Subtype 'Add 'Unary
In the first argument of ‘Negate’, namely ‘(Plus ID ID)’
In the expression: Negate (Plus ID ID)

Haskell is particularly amenable to modeling your domain, perhaps because it can be described with a fairly simple mathematical model. Crucially, your first point implies that the subtype relation is a well-order. This makes your life very easy - this model would likely translate easily to any language whose type system is at least as strong as that of Haskell.
Start by defining a type (which will be lifted to a kind) to represent your variants:
data Variant = Primary | Unary | Mult | Add | Comp | Expr
Next a non-recursive datatype to represent the nodes in your term language:
data ExprF (k :: Variant -> *) (x :: Variant) where
ID_F :: ExprF k 'Primary
Paren_F :: k 'Expr -> ExprF k 'Primary
Negate_F :: k 'Primary -> ExprF k 'Unary
Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult
Add_F :: k 'Add -> k 'Mult -> ExprF k 'Add
Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp
Recursive occurrences of terms are represented by an additional parameter. Essentially this is just the typical polynomial functor representation (i.e. Fix) but with an index parameter.
Your expression type is then:
data Expr' (x :: Variant) where
Expr' :: (x <= y) => Expr x -> Expr' y
data Expr (x :: Variant) where
MkExpr :: ExprF Expr' x -> Expr x
The <= class has not been introduced yet, but it represents your subtype relation.
As mentioned previously, your subtype relation is a well-order, and by this virtue each element in the ordering can be assigned a unique natural number such that the typical ordering on naturals respects your subtype relation. Or in other words, there is an injection f : Variant -> Nat such that x is a subtype of y iff f x <= f y (or strict subtype iff f x < f y - such a representation gives you a lot of generality).
The required injection is just given by your grammar. Note that each production is only a "subtype" (i.e. has a right-hand side which should not introduce a constructor) of of productions above it.
data Nat = Z | S Nat
infixr 0 $
type ($) f a = f a
type family VariantIx (x :: Variant) :: Nat where
VariantIx 'Primary = 'Z
VariantIx 'Unary = 'S 'Z
VariantIx 'Mult = 'S $ 'S 'Z
VariantIx 'Add = 'S $ 'S $ 'S 'Z
VariantIx 'Comp = 'S $ 'S $ 'S $ 'S 'Z
VariantIx 'Expr = 'S $ 'S $ 'S $ 'S $ 'S 'Z
You need an implicit subtype relation (which is <=) but it is often much easier to work with an explicit proof of the relation, so it is typical that the implicit version simply generates the explicit proof. To this end you write two declarations:
data family (:<=:) (x :: k) (y :: k)
class (<=) (x :: k) (y :: k) where
isLTEQ :: x :<=: y
The instances for naturals should fairly obvious:
data instance (:<=:) (x :: Nat) y where
LT_Z :: 'Z :<=: n
LT_S :: n :<=: m -> 'S n :<=: 'S m
instance 'Z <= n where isLTEQ = LT_Z
instance (n <= m) => 'S n <= 'S m where isLTEQ = LT_S isLTEQ
and the instances for Variant define the order induced by VariantIx:
newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y)
instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ
You probably want some smart constructors. If you are using a recent GHC you will have access to pattern synonyms, but it isn't necessary:
id_ = MkExpr ID_F
pattern Id = MkExpr ID_F
pattern Paren e = MkExpr (Paren_F (Expr' e))
pattern Neg e = MkExpr (Negate_F (Expr' e))
infixl 6 :+
pattern (:+) a b = MkExpr (Add_F (Expr' a) (Expr' b))
infixl 7 :*
pattern (:*) a b = MkExpr (Mult_F (Expr' a) (Expr' b))
pattern Cmp a b = MkExpr (Comp_F (Expr' a) (Expr' b))
and some simple examples:
>Id :+ Id :+ Neg Id :* Id
Add_F (Add_F ID_F ID_F) (Mult_F (Negate_F ID_F) ID_F)
>Id :+ Id :* Neg (Id :* Id)
<interactive>:6:13:
No instance for (('S $ 'S 'Z) <= 'Z) arising from a use of `Neg'
Note that you could also write your expression type in a slightly different way:
data ExprFlip (x :: Variant) where
MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y
This differs from the original in that the outermost type of an expression has the subtype relation applied to it - so e.g.
pattern Id' = MkExprFlip ID_F
has type ExprFlip t while Id :: Expr 'Primary. I can't see any other way in which they differ, and I imagine that it would simply be a matter of preference, or which use cases are most common. The original presentation has the advantage the the output type is always monomorphic, which may make type inference better in some cases, but does not affect the construction of expressions.
To address your four points:
This model relies on the semantics of the subtype relation by design.
VariantIx and the Variant type are closed. Any additional instances for :<=: or <= for Variant or Nat will overlap with the existing ones (which are as general as possible) so while in principle they can defined, attempting to use them will produce type errors.
Essentially you have a reflexive and transitive relation, and these properties are captured in the <= instance for Nat once and for all. Changing the subtype relation amounts only to changing Variant and VariantIx.
The proofs of the subtype relation are constructed by type inference - by the <= class. Since all of the indices in the ExprF datatype are monomorphic, the type checker will always be able to compute the subtype relation for the indices.
Full code:
{-# LANGUAGE StandaloneDeriving, UndecidableInstances, PatternSynonyms
, TypeOperators, KindSignatures, PolyKinds, DataKinds, GADTs, TypeFamilies
, MultiParamTypeClasses, FlexibleContexts, FlexibleInstances #-}
data Variant = Primary | Unary | Mult | Add | Comp | Expr
data ExprF (k :: Variant -> *) (x :: Variant) where
ID_F :: ExprF k 'Primary
Paren_F :: k 'Expr -> ExprF k 'Primary
Negate_F :: k 'Primary -> ExprF k 'Unary
Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult
Add_F :: k 'Add -> k 'Mult -> ExprF k 'Add
Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp
data Expr' (x :: Variant) where
Expr' :: (x <= y) => Expr x -> Expr' y
data Expr (x :: Variant) where
MkExpr :: ExprF Expr' x -> Expr x
data ExprFlip (x :: Variant) where
MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y
pattern Id' = MkExprFlip ID_F
data Nat = Z | S Nat
infixr 0 $
type ($) f a = f a
type family VariantIx (x :: Variant) :: Nat where
VariantIx 'Primary = 'Z
VariantIx 'Unary = 'S 'Z
VariantIx 'Mult = 'S $ 'S 'Z
VariantIx 'Add = 'S $ 'S $ 'S 'Z
VariantIx 'Comp = 'S $ 'S $ 'S $ 'S 'Z
VariantIx 'Expr = 'S $ 'S $ 'S $ 'S $ 'S 'Z
data family (:<=:) (x :: k) (y :: k)
class (<=) (x :: k) (y :: k) where
isLTEQ :: x :<=: y
data instance (:<=:) (x :: Nat) y where
LT_Z :: 'Z :<=: n
LT_S :: n :<=: m -> 'S n :<=: 'S m
instance 'Z <= n where isLTEQ = LT_Z
instance (n <= m) => 'S n <= 'S m where isLTEQ = LT_S isLTEQ
newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y)
instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ
id_ = MkExpr ID_F
pattern Id = MkExpr ID_F
pattern Paren e = MkExpr (Paren_F (Expr' e))
pattern Neg e = MkExpr (Negate_F (Expr' e))
infixl 6 :+
pattern (:+) a b = MkExpr (Add_F (Expr' a) (Expr' b))
infixl 7 :*
pattern (:*) a b = MkExpr (Mult_F (Expr' a) (Expr' b))
pattern Cmp a b = MkExpr (Comp_F (Expr' a) (Expr' b))
instance Show (Expr' x) where
showsPrec k (Expr' x) = showsPrec k x
instance Show (Expr x) where
showsPrec k (MkExpr x) = showsPrec k x
deriving instance (Show (k 'Mult), Show (k 'Add), Show (k 'Expr), Show (k 'Primary), Show (k 'Unary)) => Show (ExprF k x)

Unless I misunderstand, polymorphic variants can do pretty much exactly this. However, "untagged union" isn't a great term to use (I imagine most people would think you were asking for C-style unions).
The example would look like this:
type empty = [`Empty]
type bin_op = Add | Sub
type expr = [`Int of int | `Add of expr * bin_op * expr]
type optional_expr = [empty | expr]
type weird_expr = [expr | `Wierd of expr | `Zonk of string]
Note that with OCaml's polymorphic variants the subtype relationship is defined structurally and not between named types.

There are two things that come to my mind, both for "real" subtyping systems (so not available in Haskell), although I am not completely sure whether any of them fits all of your requirements:
Explicit untagged union types, as in Ceylon, which let you name the type A | B, which is a supertype of both A and B. Thus, you could just make Empty and Expr normal ADTs, and then declare a synonym type OptionalExpr = Empty | Expr.
The way ADTs are modelled in Scala, as hierarchies of sealed traits and case classes:
sealed trait OptionalExpr
case object Empty extends OptionalExpr
sealed trait Expr extends OptionalExpr
case class IntExpr(i: Int) extends OptionaExpr
case class AddExpr(lhs: Expr, op: AddOp, rhs: Expr) extends OptionalExpr
This way, OptionalExpr and Expr are not extensible (since the traits are sealed) and behave mostly like ADTs in Haskell, but you can still access the "intermediate" types like in a normal inheritance hierarchy (unlike in Haskell, where you have only the constructors, which are not by themselves types).
Both cases require a form of pattern matching to access values, of course, since you have to recover in which "part of the union" you are.

Type families for dummies

Could someone give a super simple (few line) example to get a basic understanding about what type families can be used for and what are they ?
The 2+2 kind of example of type families ?

Here's an example:
{-# Language TypeFamilies, DataKinds, KindSignatures, GADTs, UndecidableInstances #-}
data Nat = Z | S Nat
type family Plus (x :: Nat) (y :: Nat) :: Nat where
Plus 'Z y = y
Plus ('S x) y = 'S (Plus x y)
data Vec :: Nat -> * -> * where
Nil :: Vec 'Z a
Cons :: a -> Vec n a -> Vec ('S n) a
append :: Vec m a -> Vec n a -> Vec (Plus m n) a
append Nil ys = ys
append (Cons x xs) ys = Cons x (append xs ys)
Note that many/most interesting applications of type families require UndecidableInstances. You should not be scared of this extension.
Another useful sort of type family is one associated with a class. For a really contrived example,
class Box b where
type Elem b :: *
elem :: b -> Elem b
An instance of Box is a type that something can be pulled out of. For instance,
instance Box (Identity x) where
type Elem (Identity x) = x
elem = runIdentity
instance Box Char where
type Elem Char = String
elem c = [c]
Now elem (Identity 3) = 3 and elem 'x' = "x".
You can also use type families to make weird skolem variables. This is best done in the as-yet-unreleased GHC 8.0.1, where it will look like
type family Any :: k where {}
Any is a peculiar type. It's uninhabited, it can't be (specifically) an instance of a class, and it's poly-kinded. This turns out to be really useful for certain purposes. This particular type is advertised as a safe target for unsafeCoerce, but Data.Constraint.Forall uses similar type families for more interesting purposes.

how to reify a list of data using Data.Reify?

I tried to read the paper (http://www.ittc.ku.edu/csdl/fpg/sites/default/files/Gill-09-TypeSafeReification.pdf) and managed to reify my symbolic expression type, but I can't figure out how to reify a list of them. Here's the simplified code:
{-# OPTIONS_GHC -Wall #-}
{-# Language TypeOperators #-}
{-# Language TypeFamilies #-}
{-# Language FlexibleInstances #-}
import Control.Applicative
import Data.Reify
-- symbolic expression type
data Expr a = EConst a
| EBin (Expr a) (Expr a)
deriving Show
-- corresponding node type
data GraphExpr a b = GConst a
| GBin b b
deriving Show
instance MuRef (Expr a) where
type DeRef (Expr a) = GraphExpr a
mapDeRef _ (EConst c) = pure (GConst c)
mapDeRef f (EBin u v) = GBin <$> f u <*> f v
-- this works as expected
main :: IO ()
main = reifyGraph (EBin x (EBin x y)) >>= print
where
x = EConst "x"
y = EConst "y"
-- (output: "let [(1,GBin 2 3),(3,GBin 2 4),(4,GConst "y"),(2,GConst "x")] in 1")
-- but what if I want to reify a list of Exprs?
data ExprList a = ExprList [Expr a]
data GraphList a b = GraphList [GraphExpr a b]
instance MuRef (ExprList a) where
type DeRef (ExprList a) = GraphList a
-- mapDeRef f (ExprList xs) = ???????

I had the exact same problem, and I found a solution using data-reify.
The things you have to realise to arrive at a solution is that:
1. Even though the EDSL doesnt have lists, the graph type could contain them
2. It is possible to reify different types of data to the same result type.
So we start by adding list constructors to our result type:
data GraphExpr a b = GConst a
| GBin b b
| Cons b b
| Nil
deriving Show
Then we need a second instance of MuRef, that reifies lists of Expr a to GraphExpr.
instance MuRef [Expr a] where
type DeRef [Expr a] = GraphExpr a
mapDeRef _ [] = pure Nil
mapDeRef f (x:xs) = Cons <$> f x <*> f xs
Now with this in place, if we try to reify a list expression
reified = reifyGraph [EBin x (EBin x y), Ebin y (EBin x y)]
where x = EConst "x"
y = EConst "y"
We'll get the result
let [(1,Cons 2 6),(6,Cons 7 9),(9,Nil),(7,GBin 5 8),(8,GBin 3 5),(2,GBin 3 4),(4,GBin 3 5),(5,GConst "y"),(3,GConst "x")] in 1
To extract the list of reified node-ids from this graph we can define a little function to walk the Conses and to extract the node-ids from them into a list.
walkConses :: Graph (GraphExpr t) -> [Unique]
walkConses (Graph xs root) = go (lookup root xs)
where
go (Just (Cons n1 n2)) = n1 : go (lookup n2 xs)
go (Just Nil) = []
(If the graphs are huge, it might be a good idea to convert them to an IntMap before starting the walk)
This looks like a partial function, but since we know that the root of the DAG will always be a Cons-node (since we reify a list), and since we know that all node-ids are in xs this function will return a list of all node-ids in the result list.
So if we run walkConses on our resulting graph we'll get the result:
[2, 7]
Hope this helps, I've been wrestling with this problem for a while too.

You really can't do that with MuRef. GraphLists don't contain GraphLists. You can reify each Expr in turn and write a one-off combinator to smash them into your GraphList though:
Just use traverse reifyGraph over the ExprList contents.
Also, both of the latter should probably be newtypes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string