What is the right way to typecheck dependent lambda abstraction using 'bound'? - haskell

I am implementing a simple dependently-typed language, similar to the one described by Lennart Augustsson, while also using bound to manage bindings.
When typechecking a dependent lambda term, such as λt:* . λx:t . x, I need to:
"Enter" the outer lambda binder, by instantiating t to something
Typecheck λx:t . x, yielding ∀x:t . t
Pi-abstract the t, yielding ∀t:* . ∀x:t . t
If lambda was non-dependent, I could get away with instantiating t with its type on step 1, since the type is all I need to know about the variable while typechecking on step 2.
But on step 3 I lack the information to decide which variables to abstract over.
I could introduce a fresh name supply and instantiate t with a Bound.Name.Name containing both the type and a unique name. But I thought that with bound I shouldn't need to generate fresh names.
Is there an alternative solution I'm missing?

We need some kind of context to keep track of the lambda arguments. However, we don't necessarily need to instantiate them, since bound gives us de Bruijn indices, and we can use those indices to index into the context.
Actually using the indices is a bit involved, though, because of the type-level machinery that reflects the size of the current scope (or in other words, the current depth in the expression) through the nesting of Var-s. It necessitates the use of polymorphic recursion or GADTs. It also prevents us from storing the context in a State monad (because the size and thus the type of the context changes as we recurse). I wonder though if we could use an indexed state monad; it'd be a fun experiment. But I digress.
The simplest solution is to represent the context as a function:
type TC a = Either String a -- our checker monad
type Cxt a = a -> TC (Type a) -- the context
The a input is essentially a de Bruijn index, and we look up a type by applying the function to the index. We can define the empty context the following way:
emptyCxt :: Cxt a
emptyCxt = const $ Left "variable not in scope"
And we can extend the context:
consCxt :: Type a -> Cxt a -> Cxt (Var () a)
consCxt ty cxt (B ()) = pure (F <$> ty)
consCxt ty cxt (F a) = (F <$>) <$> cxt a
The size of the context is encoded in the Var nesting. The increase in the size is apparent here in the return type.
Now we can write the type checker. The main point here is that we use fromScope and toScope to get under binders, and we carry along an appropriately extended Cxt (whose type lines up just perfectly).
data Term a
= Var a
| Star -- or alternatively, "Type", or "*"
| Lam (Type a) (Scope () Term a)
| Pi (Type a) (Scope () Term a)
| App (Type a) (Term a)
deriving (Show, Eq, Functor)
-- boilerplate omitted (Monad, Applicative, Eq1, Show1 instances)
-- reduce to normal form
rnf :: Term a -> Term a
rnf = ...
-- Note: IIRC "Simply easy" and Augustsson's post reduces to whnf
-- when type checking. I use here plain normal form, because it
-- simplifies the presentation a bit and it also works fine.
-- We rely on Bound's alpha equality here, and also on the fact
-- that we keep types in normal form, so there's no need for
-- additional reduction.
check :: Eq a => Cxt a -> Type a -> Term a -> TC ()
check cxt want t = do
have <- infer cxt t
when (want /= have) $ Left "type mismatch"
infer :: Eq a => Cxt a -> Term a -> TC (Type a)
infer cxt = \case
Var a -> cxt a
Star -> pure Star -- "Type : Type" system for simplicity
Lam ty t -> do
check cxt Star ty
let ty' = rnf ty
Pi ty' . toScope <$> infer (consCxt ty' cxt) (fromScope t)
Pi ty t -> do
check cxt Star ty
check (consCxt (rnf ty) cxt) Star (fromScope t)
pure Star
App f x ->
infer cxt f >>= \case
Pi ty t -> do
check cxt ty x
pure $ rnf (instantiate1 x t)
_ -> Left "can't apply non-function"
Here's the working code containing the above definitions. I hope I didn't mess it up too badly.

Related

Confused about GADTs and propagating constraints

There's plenty of Q&A about GADTs being better than DatatypeContexts, because GADTs automagically make constraints available in the right places. For example here, here, here. But sometimes it seems I still need an explicit constraint. What's going on? Example adapted from this answer:
{-# LANGUAGE GADTs #-}
import Data.Maybe -- fromJust
data GADTBag a where
MkGADTBag :: Eq a => { unGADTBag :: [a] } -> GADTBag a
baz (MkGADTBag x) (Just y) = x == y
baz2 x y = unGADTBag x == fromJust y
-- unGADTBag :: GADTBag a -> [a] -- inferred, no Eq a
-- baz :: GADTBag a -> Maybe [a] -> Bool -- inferred, no Eq a
-- baz2 :: Eq a => GADTBag a -> Maybe [a] -> Bool -- inferred, with Eq a
Why can't the type for unGADTBag tell us Eq a?
baz and baz2 are morally equivalent, yet have different types. Presumably because unGADTBag has no Eq a, then the constraint can't propagate into any code using unGADTBag.
But with baz2 there's an Eq a constraint hiding inside the GADTBag a. Presumably baz2's Eq a will want a duplicate of the dictionary already there(?)
Is it that potentially a GADT might have many data constructors, each with different (or no) constraints? That's not the case here, or with typical examples for constrained data structures like Bags, Sets, Ordered Lists.
The equivalent for a GADTBag datatype using DatatypeContexts infers baz's type same as baz2.
Bonus question: why can't I get an ordinary ... deriving (Eq) for GADTBag? I can get one with StandaloneDeriving, but it's blimmin obvious, why can't GHC just do it for me?
deriving instance (Eq a) => Eq (GADTBag a)
Is the problem again that there might be other data constructors?
(Code exercised at GHC 8.6.5, if that's relevant.)
Addit: in light of #chi's and #leftroundabout's answers -- neither of which I find convincing. All of these give *** Exception: Prelude.undefined:
*DTContexts> unGADTBag undefined
*DTContexts> unGADTBag $ MkGADTBag undefined
*DTContexts> unGADTBag $ MkGADTBag (undefined :: String)
*DTContexts> unGADTBag $ MkGADTBag (undefined :: [a])
*DTContexts> baz undefined (Just "hello")
*DTContexts> baz (MkGADTBag undefined) (Just "hello")
*DTContexts> baz (MkGADTBag (undefined :: String)) (Just "hello")
*DTContexts> baz2 undefined (Just "hello")
*DTContexts> baz2 (MkGADTBag undefined) (Just "hello")
*DTContexts> baz2 (MkGADTBag (undefined :: String)) (Just "hello")
Whereas these two give the same type error at compile time * Couldn't match expected type ``[Char]'* No instance for (Eq (Int -> Int)) arising from a use of ``MkGADTBag'/ ``baz2' respectively [Edit: my initial Addit gave the wrong expression and wrong error message]:
*DTContexts> baz (MkGADTBag (undefined :: [Int -> Int])) (Just [(+ 1)])
*DTContexts> baz2 (MkGADTBag (undefined :: [Int -> Int])) (Just [(+ 1)])
So baz, baz2 are morally equivalent not just in that they return the same result for the same well-defined arguments; but also in that they exhibit the same behaviour for the same ill-defined arguments. Or they differ only in where the absence of an Eq instance gets reported?
#leftroundabout Before you've actually deconstructed the x value, there's no way of knowing that the MkGADTBag constructor indeed applies.
Yes there is: field label unGADTBag is defined if and only if there's a pattern match on MkGADTBag. (It would maybe be different if there were other constructors for the type -- especially if those also had a label unGADTBag.) Again, being undefined/lazy evaluation doesn't postpone the type-inference.
To be clear, by "[not] convincing" I mean: I can see the behaviour and the inferred types I'm getting. I don't see that laziness or potential undefinedness gets in the way of type inference. How could I expose a difference between baz, baz2 that would explain why they have different types?
Function calls never bring type class constraints in scope, only (strict) pattern matching does.
The comparison
unGADTBag x == fromJust y
is essentially a function call of the form
foo (unGADTBag x) (fromJust y)
where foo requires Eq a. That would morally be provided by unGADTBag x, but that expression is not yet evaluated! Because of laziness, unGADTBag x will be evaluated only when (and if) foo demands its first argument.
So, in order to call foo in this example we need its argument to be evaluated in advance. While Haskell could work like this, it would be a rather surprising semantics, where arguments are evaluated or not depending on whether they provide a type class constraint which is needed. Imagine more general cases like
foo (if cond then unGADTBag x else unGADTBag z) (fromJust y)
What should be evaluated here? unGADTBag x? unGADTBag y? Both? cond as well? It's hard to tell.
Because of these issues, Haskell was designed so that we need to manually require the evaluation of a GADT value like x using pattern matching.
Why can't the type for unGADTBag tell us Eq a?
Before you've actually deconstructed the x value, there's no way of knowing that the MkGADTBag constructor indeed applies. Sure, if it doesn't then you have other problems (bottom), but those might conceivably not surface. Consider
ignore :: a -> b -> b
ignore _ = id
baz2' :: GADTBag a -> Maybe [a] -> Bool
baz2' x y = ignore (unGADTBag x) (y==y)
Note that I could now invoke the function with, say, undefined :: GADTBag (Int->Int). Shouldn't be a problem since the undefined is ignored, right★? Problem is, despite Int->Int not having an Eq instance, I was able to write y==y, which y :: Maybe [Int->Int] can't in fact support.
So, we can't have that only mentioning unGADTBag is enough to spew the Eq a constraint into its surrounding scope. Instead, we must clearly delimit the scope of that constraint to where we've confirmed that the MkGADTBag constructor does apply, and a pattern match accomplishes that.
★If you're annoyed that my argument relies on undefined, note that the same issue arises also when there are multiple constructors which would bring different constraints into scope.
An alternative to a pattern-match that does work is this:
{-# LANGUAGE RankNTypes #-}
withGADTBag :: GADTBag a -> (Eq a => [a] -> b) -> b
withGADTBag (MkGADTBag x) f = f x
baz3 :: GADTBag a -> Maybe [a] -> Bool
baz3 x y = withGADTBag x (== fromJust y)
Response to edits
All of these give *** Exception: Prelude.undefined:
Yes of course they do, because you actually evaluate x == y in your function. So the function can only possibly yield non-⟂ if the inputs have a NF. But that's by no means the case for all functions.
Whereas these two give the same type error at compile time
Of course they do, because you're trying to wrap a value of non-Eq type in the MkGADTBag constructor, which explicitly requires that constraint (and allows you to explicitly unwrap it again!), whereas the GADTBag type doesn't require that constraint. (Which is kind of the whole point about this sort of encapsulation!)
Before you've actually deconstructed the x value, there's no way of knowing that the `MkGADTBag` constructor indeed applies.Yes there is: field label `unGADTBag` is defined if and only if there's a pattern match on `MkGADTBag`.
Arguably, that's the way field labels should work, but they don't, in Haskell. A field label is nothing but a function from the data type to the field type, and a nontotal function at that if there are multiple constructors.Yeah, Haskell records are one of the worst-designed features of the language. I personally tend to use field labels only for big, single-constructor, plain-old-data types (and even then I prefer using not the field labels directly but lenses derived from them).
Anyway though, I don't see how “field label is defined iff there's a pattern match” could even be implemented in a way that would allow your code to work the way you think it should. The compiler would have to insert the step of confirming that the constructor applies (and extracting its GADT-encapsulated constraint) somewhere. But where? In your example it's reasonably obvious, but in general x could inhabit a vast scope with lots of decision branches and you really don't want it to get evaluated in a branch where the constraint isn't actually needed.
Also keep in mind that when we argue with undefined/⟂ it's not just about actually diverging computations, more typically you're worried about computations that would simply take a long time (just, Haskell doesn't actually have a notion of “taking a long time”).
The way to think about this is OutsideIn(X) ... with local assumptions. It's not about undefinedness or lazy evaluation. A pattern match on a GADT constructor is outside, the RHS of the equation is inside. Constraints from the constructor are made available only locally -- that is only inside.
baz (MkGADTBag x) (Just y) = x == y
Has an explicit data constructor MkGADTBag outside, supplying an Eq a. The x == y raises a wanted Eq a locally/inside, which gets discharged from the pattern match. OTOH
baz2 x y = unGADTBag x == fromJust y
Has no explicit data constructor outside, so no context is supplied. unGADTBag has a Eq a, but that is deeper inside the l.h. argument to ==; type inference doesn't go looking deeper inside. It just doesn't. Then in the effective definition for unGADTBag
unGADTBag (MkGADTBag x) = x
there is an Eq a made available from the outside, but it cannot escape from the RHS into the type environment at a usage site for unGADTBag. It just doesn't. Sad!
The best I can see for an explanation is towards the end of the OutsideIn paper, Section 9.7 Is the emphasis on principal types well-justified? (A rhetorical question but my answer would me: of course we must emphasise principal types; type inference could get better principaled under some circumstances.) That last section considers this example
data R a where
RInt :: Int -> R Int
RBool :: Bool -> R Bool
RChar :: Char -> R Char
flop1 (RInt x) = x
there is a third type that is arguably more desirable [for flop1], and that type is R Int -> Int.
flop1's definition is of the same form as unGADTBag, with a constrained to be Int.
flop2 (RInt x) = x
flop2 (RBool x) = x
Unfortunately, ordinary polymorphic types are too weak to express this restriction [that a must be only Int or Bool] and we can only get Ɐa.R a -> a for flop2, which does not rule the application of flop2 to values of type R Char.
So at that point the paper seems to give up trying to refine better principal types:
In conclusion, giving up on some natural principal types in favor of more specialized types that eliminate more pattern match errors at runtime is appealing but does not quite work unless we consider a more expressive syntax of types. Furthermore it is far from obvious how to specify these typings in a high-level declarative specification.
"is appealing". It just doesn't.
I can see a general solution is difficult/impossible. But for use-cases of constrained Bags/Lists/Sets, the specification is:
All data constructors have the same constraint(s) on the datatype's parameters.
All constructors yield the same type (... -> T a or ... -> T [a] or ... -> T Int, etc).
Datatypes with a single constructor satisfy that trivially.
To satisfy the first bullet, for a Set type using a binary balanced tree, there'd be a non-obvious definition for the Nil constructor:
data OrdSet a where
SNode :: Ord a => OrdSet a -> a -> OrdSet a -> OrdSet a
SNil :: Ord a => OrdSet a -- seemingly redundant Ord constraint
Even so, repeating the constraint on every node and every terminal seems wasteful: it's the same constraint all the way down (which is unlike GADTs for EDSL abstract syntax trees); presumably each node carries a copy of exactly the same dictionary.
The best way to ensure same constraint(s) on every constructor could just be prefixing the constraint to the datatype:
data Ord a => OrdSet a where ...
And perhaps the constraint could go 'OutsideOut' to the environment that's accessing the tree.
Another possible approach is to use a PatternSynonym with an explicit signature giving a Required constraint.
pattern EqGADTBag :: Eq a => [a] -> GADTBag a -- that Eq a is the *Required*
pattern EqGADTBag{ unEqGADTBag } = MkGADTBag unEqGADTBag -- without sig infers Eq a only as *Provided*
That is, without that explicit sig:
*> :i EqGADTBag
pattern EqGADTBag :: () => Eq a => [a] -> GADTBag a
The () => Eq a => ... shows Eq a is Provided, arising from the GADT constructor.
Now we get both inferred baz, baz2 :: Eq a => GADTBag a -> Maybe [a] -> Bool:
baz (EqGADTBag x) (Just y) = x == y
baz2 x y = unEqGADTBag x == fromJust y
As a curiosity: it's possible to give those equations for baz, baz2 as well as those in the O.P. using the names from the GADT decl. GHC warns of overlapping patterns [correctly]; and does infer the constrained sig for baz.
I wonder if there's a design pattern here? Don't put constraints on the data constructor -- that is, don't make it a GADT. Instead declare a 'shadow' PatternSynonym with the Required/Provided constraints.
You can capture the constraint in a fold function, (Eq a => ..) says you can assume Eq a but only within the function next (which is defined after a pattern match). If you instantiate next as = fromJust maybe == as it uses this constraint to witness equality
-- local constraint
-- |
-- vvvvvvvvvvvvvvvvvv
foldGADTBag :: (Eq a => [a] -> res) -> GADTBag a -> res
foldGADTBag next (MkGADTBag as) = next as
baz3 :: GADTBag a -> Maybe [a] -> Bool
baz3 gadtBag maybe = foldGADTBag (fromJust maybe ==) gadtBag
type Ty :: Type -> Type
data Ty a where
TyInt :: Int -> Ty Int
TyUnit :: Ty ()
-- locally assume Int locally assume unit
-- | |
-- vvvvvvvvvvvvvvvvvvv vvvvvvvvvvvvv
foldTy :: (a ~ Int => a -> res) -> (a ~ () => res) -> (Ty a -> res)
foldTy int unit (TyInt i) = int i
foldTy int unit TyUnit = unit
eval :: Ty a -> a
eval = foldTy id ()

Generics : run-time ADT for types with instances

Is it possible with Haskell / GHC, to extract an algebraic data type representing all types with Eq and Ord instances ? This would probably need Generics, Typeable, etc.
What I would like is something like :
data Data_Eq_Ord = Data_String String
| Data_Int Int
| Data_Bool Bool
| ...
deriving (Eq, Ord)
For all types known to have instances for Eq and Ord. If it makes the solution easier, we can limit our scope to Ord instances, since Eq is implied by Ord. But is would be interesting to know if constraints intersection is possible.
This data type would be useful because it gives the possibility to use it where Eq and Ord constraints are required, and pattern-match at runtime to refine on types.
I would need this to implement a generic Map Key Value, where Key would be this type, in a Document Indexing library, where the keys and their type is known at run-time. This library is here. For the moment I worked around the issue by defining a data DocIndexKey, and a FieldKey class, but this is not fully satisfactory since it requires boilerplate and can't cover all legit candidates.
Any good alternative approach to this situation is welcome. For some reasons, I prefer to avoid Template Haskell.
Well, it's not an ADT, but this definitely works:
data Satisfying c = forall a. c a => Satisfy a
class (l a, r a) => And l r a where
instance (l a, r a) => And l r a where
ex :: [Satisfying (Typeable `And` Show `And` Ord)]
ex = [ Satisfy (7 :: Int)
, Satisfy "Hello"
, Satisfy (5 :: Int)
, Satisfy [10..20 :: Int]
, Satisfy ['a'..'z']
, Satisfy ((), 'a')]
-- An example of use, with "complicated" logic
data With f c = forall a. c a => With (f a)
-- vvvvvvvvvvvvvvvvvvvvvvvvvv QuantifiedConstraints chokes on this, which is probably a bug...
partitionTypes :: (forall a. c a => TypeRep a) -> [Satisfying c] -> [[] `With` c]
partitionTypes rep = foldr go []
where go (Satisfy x) [] = [With [x]]
go x'#(Satisfy (x :: a)) (xs'#(With (xs :: [b])) : xss) =
case testEquality rep rep :: Maybe (a :~: b) of
Just Refl -> With (x : xs) : xss
Nothing -> xs' : go x' xss
main :: IO ()
main = traverse_ (\(With xs) -> print (sort xs)) $ partitionTypes typeRep ex
Exhaustivity is much harder. Perhaps with a plugin, you could get GHC to do it, but why bother? I don't believe GHC actually tries to keep track of what types it has seen. In particular, you'd have to scan all modules in the project and its dependencies, even those that haven't been loaded by the module containing the type definition. You'd have to implement it from the ground-up. And, as this answer shows, I very much doubt you would actually be able to use such exhaustivity for anything that you can't already do by just taking the open universe as it is.

How do you apply function constraints in instance methods in Haskell?

I'm learning how to use typeclasses in Haskell.
Consider the following implementation of a typeclass T with a type constrained class function f.
class T t where
f :: (Eq u) => t -> u
data T_Impl = T_Impl_Bool Bool | T_Impl_Int Int | T_Impl_Float Float
instance T T_Impl where
f (T_Impl_Bool x) = x
f (T_Impl_Int x) = x
f (T_Impl_Float x) = x
When I load this into GHCI 7.10.2, I get the following error:
Couldn't match expected type ‘u’ with actual type ‘Float’
‘u’ is a rigid type variable bound by
the type signature for f :: Eq u => T_Impl -> u
at generics.hs:6:5
Relevant bindings include
f :: T_Impl -> u (bound at generics.hs:6:5)
In the expression: x
In an equation for ‘f’: f (T_Impl_Float x) = x
What am I doing/understanding wrong? It seems reasonable to me that one would want to specialize a typeclass in an instance by providing an accompaning data constructor and function implementation. The part
Couldn't match expected type 'u' with actual type 'Float'
is especially confusing. Why does u not match Float if u only has the constraint that it must qualify as an Eq type (Floats do that afaik)?
The signature
f :: (Eq u) => t -> u
means that the caller can pick t and u as wanted, with the only burden of ensuring that u is of class Eq (and t of class T -- in class methods there's an implicit T t constraint).
It does not mean that the implementation can choose any u.
So, the caller can use f in any of these ways: (with t in class T)
f :: t -> Bool
f :: t -> Char
f :: t -> Int
...
The compiler is complaining that your implementation is not general enough to cover all these cases.
Couldn't match expected type ‘u’ with actual type ‘Float’
means "You gave me a Float, but you must provide a value of the general type u (where u will be chosen by the caller)"
Chi has already pointed out why your code doesn't compile. But it's not even that typeclasses are the problem; indeed, your example has only one instance, so it might just as well be a normal function rather than a class.
Fundamentally, the problem is that you're trying to do something like
foobar :: Show x => Either Int Bool -> x
foobar (Left x) = x
foobar (Right x) = x
This won't work. It tries to make foobar return a different type depending on the value you feed it at run-time. But in Haskell, all types must be 100% determined at compile-time. So this cannot work.
There are several things you can do, however.
First of all, you can do this:
foo :: Either Int Bool -> String
foo (Left x) = show x
foo (Right x) = show x
In other words, rather than return something showable, actually show it. That means the result type is always String. It means that which version of show gets called will vary at run-time, but that's fine. Code paths can vary at run-time, it's types which cannot.
Another thing you can do is this:
toInt :: Either Int Bool -> Maybe Int
toInt (Left x) = Just x
toInt (Right x) = Nothing
toBool :: Either Int Bool -> Maybe Bool
toBool (Left x) = Nothing
toBool (Right x) = Just x
Again, that works perfectly fine.
There are other things you can do; without knowing why you want this, it's difficult to suggest others.
As a side note, you want to stop thinking about this like it's object oriented programming. It isn't. It requires a new way of thinking. In particular, don't reach for a typeclass unless you really need one. (I realise this particular example may just be a learning exercise to learn about typeclasses of course...)
It's possible to do this:
class Eq u => T t u | t -> u where
f :: t -> u
You need FlexibleContextx+FunctionalDepencencies and MultiParamTypeClasses+FlexibleInstances on call-site. Or to eliminate class and to use data types instead like Gabriel shows here

Typed abstract syntax and DSL design in Haskell

I'm designing a DSL in Haskell and I would like to have an assignment operation. Something like this (the code below is just for explaining my problem in a limited context, I didn't have type checked Stmt type):
data Stmt = forall a . Assign String (Exp a) -- Assignment operation
| forall a. Decl String a -- Variable declaration
data Exp t where
EBool :: Bool -> Exp Bool
EInt :: Int -> Exp Int
EAdd :: Exp Int -> Exp Int -> Exp Int
ENot :: Exp Bool -> Exp Bool
In the previous code, I'm able to use a GADT to enforce type constraints on expressions. My problem is how can I enforce that the left hand side of an assignment is: 1) Defined, i.e., a variable must be declared before it is used and 2) The right hand side must have the same type of the left hand side variable?
I know that in a full dependently typed language, I could define statements indexed by some sort of typing context, that is, a list of defined variables and their type. I believe that this would solve my problem. But, I'm wondering if there is some way to achieve this in Haskell.
Any pointer to example code or articles is highly appreciated.
Given that my work focuses on related issues of scope and type safety being encoded at the type-level, I stumbled upon this old-ish question whilst googling around and thought I'd give it a try.
This post provides, I think, an answer quite close to the original specification. The whole thing is surprisingly short once you have the right setup.
First, I'll start with a sample program to give you an idea of what the end result looks like:
program :: Program
program = Program
$ Declare (Var :: Name "foo") (Of :: Type Int)
:> Assign (The (Var :: Name "foo")) (EInt 1)
:> Declare (Var :: Name "bar") (Of :: Type Bool)
:> increment (The (Var :: Name "foo"))
:> Assign (The (Var :: Name "bar")) (ENot $ EBool True)
:> Done
Scoping
In order to ensure that we may only assign values to variables which have been declared before, we need a notion of scope.
GHC.TypeLits provides us with type-level strings (called Symbol) so we can very-well use strings as variable names if we want. And because we want to ensure type safety, each variable declaration comes with a type annotation which we will store together with the variable name. Our type of scopes is therefore: [(Symbol, *)].
We can use a type family to test whether a given Symbol is in scope and return its associated type if that is the case:
type family HasSymbol (g :: [(Symbol,*)]) (s :: Symbol) :: Maybe * where
HasSymbol '[] s = 'Nothing
HasSymbol ('(s, a) ': g) s = 'Just a
HasSymbol ('(t, a) ': g) s = HasSymbol g s
From this definition we can define a notion of variable: a variable of type a in scope g is a symbol s such that HasSymbol g s returns 'Just a. This is what the ScopedSymbol data type represents by using an existential quantification to store the s.
data ScopedSymbol (g :: [(Symbol,*)]) (a :: *) = forall s.
(HasSymbol g s ~ 'Just a) => The (Name s)
data Name (s :: Symbol) = Var
Here I am purposefully abusing notations all over the place: The is the constructor for the type ScopedSymbol and Name is a Proxy type with a nicer name and constructor. This allows us to write such niceties as:
example :: ScopedSymbol ('("foo", Int) ': '("bar", Bool) ': '[]) Bool
example = The (Var :: Name "bar")
Statements
Now that we have a notion of scope and of well-typed variables in that scope, we can start considering the effects Statements should have. Given that new variables can be declared in a Statement, we need to find a way to propagate this information in the scope. The key hindsight is to have two indices: an input and an output scope.
To Declare a new variable together with its type will expand the current scope with the pair of the variable name and the corresponding type.
Assignments on the other hand do not modify the scope. They merely associate a ScopedSymbol to an expression of the corresponding type.
data Statement (g :: [(Symbol, *)]) (h :: [(Symbol,*)]) where
Declare :: Name s -> Type a -> Statement g ('(s, a) ': g)
Assign :: ScopedSymbol g a -> Exp g a -> Statement g g
data Type (a :: *) = Of
Once again we have introduced a proxy type to have a nicer user-level syntax.
example' :: Statement '[] ('("foo", Int) ': '[])
example' = Declare (Var :: Name "foo") (Of :: Type Int)
example'' :: Statement ('("foo", Int) ': '[]) ('("foo", Int) ': '[])
example'' = Assign (The (Var :: Name "foo")) (EInt 1)
Statements can be chained in a scope-preserving way by defining the following GADT of type-aligned sequences:
infixr 5 :>
data Statements (g :: [(Symbol, *)]) (h :: [(Symbol,*)]) where
Done :: Statements g g
(:>) :: Statement g h -> Statements h i -> Statements g i
Expressions
Expressions are mostly unchanged from your original definition except that they are now scoped and a new constructor EVar lets us dereference a previously-declared variable (using ScopedSymbol) giving us an expression of the appropriate type.
data Exp (g :: [(Symbol,*)]) (t :: *) where
EVar :: ScopedSymbol g a -> Exp g a
EBool :: Bool -> Exp g Bool
EInt :: Int -> Exp g Int
EAdd :: Exp g Int -> Exp g Int -> Exp g Int
ENot :: Exp g Bool -> Exp g Bool
Programs
A Program is quite simply a sequence of statements starting in the empty scope. We use, once more, an existential quantification to hide the scope we end up with.
data Program = forall h. Program (Statements '[] h)
It is obviously possible to write subroutines in Haskell and use them in your programs. In the example, I have the very simple increment which can be defined like so:
increment :: ScopedSymbol g Int -> Statement g g
increment v = Assign v (EAdd (EVar v) (EInt 1))
I have uploaded the whole code snippet together with the right LANGUAGE pragmas and the examples listed here in a self-contained gist. I haven't however included any comments there.
You should know that your goals are quite lofty. I don't think you will get very far treating your variables exactly as strings. I'd do something slightly more annoying to use, but more practical. Define a monad for your DSL, which I'll call M:
newtype M a = ...
data Exp a where
... as before ...
data Var a -- a typed variable
assign :: Var a -> Exp a -> M ()
declare :: String -> a -> M (Var a)
I'm not sure why you have Exp a for assignment and just a for declaration, but I reproduced that here. The String in declare is just for cosmetics, if you need it for code generation or error reporting or something -- the identity of the variable should really not be tied to that name. So it's usually used as
myFunc = do
foobar <- declare "foobar" 42
which is the annoying redundant bit. Haskell doesn't really have a good way around this (though depending on what you're doing with your DSL, you may not need the string at all).
As for the implementation, maybe something like
data Stmt = forall a. Assign (Var a) (Exp a)
| forall a. Declare (Var a) a
data Var a = Var String Integer -- string is auxiliary from before, integer
-- stores real identity.
For M, we need a unique supply of names and a list of statements to output.
newtype M a = M { runM :: WriterT [Stmt] (StateT Integer Identity a) }
deriving (Functor, Applicative, Monad)
Then the operations as usually fairly trivial.
assign v a = M $ tell [Assign v a]
declare name a = M $ do
ident <- lift get
lift . put $! ident + 1
let var = Var name ident
tell [Declare var a]
return var
I've made a fairly large DSL for code generation in another language using a fairly similar design, and it scales well. I find it a good idea to stay "near the ground", just doing solid modeling without using too many fancy type-level magical features, and accepting minor linguistic annoyances. That way Haskell's main strength -- it's ability to abstract -- can still be used for code in your DSL.
One drawback is that everything needs to be defined within a do block, which can be a hinderance to good organization as the amount of code grows. I'll steal declare to show a way around that:
declare :: String -> M a -> M a
used like
foo = declare "foo" $ do
-- actual function body
then your M can have as a component of its state a cache from names to variables, and the first time you use a declaration with a certain name you render it and put it in a variable (this will require a bit more sophisticated monoid than [Stmt] as the target of your Writer). Later times you just look up the variable. It does have a rather floppy dependence on uniqueness of names, unfortunately; an explicit model of namespaces can help with that but never eliminate it entirely.
After seeing all the code by #Cactus and the Haskell suggestions by #luqui, I've managed to got a solution close to what I want in Idris. The complete code is available at the following gist:
(https://gist.github.com/rodrigogribeiro/33356c62e36bff54831d)
Some little things I need to fix in the previous solution:
I don't know (yet) if Idris support integer literal overloading, what would be quite useful to build my DSL.
I've tried to define in DSL syntax a prefix operator for program variables, but it didn't worked as I like. I've got a solution (in the previous gist) that uses a keyword --- use --- for variable access.
I'll check this minor points with guys in Idris #freenode channel to see if these two points are possible.

Does exporting type constructors make a difference?

Let's say I have an internal data type, T a, that is used in the signature of exported functions:
module A (f, g) where
newtype T a = MkT { unT :: (Int, a) }
deriving (Functor, Show, Read) -- for internal use
f :: a -> IO (T a)
f a = fmap (\i -> T (i, a)) randomIO
g :: T a -> a
g = snd . unT
What is the effect of not exporting the type constructor T? Does it prevent consumers from meddling with values of type T a? In other words, is there a difference between the export list (f, g) and (f, g, T()) here?
Prevented
The first thing a consumer will see is that the type doesn't appear in Haddock documentation. In the documentation for f and g, the type Twill not be hyperlinked like an exported type. This may prevent a casual reader from discovering T's class instances.
More importantly, a consumer cannot doing anything with T at the type level. Anything that requires writing a type will be impossible. For instance, a consumer cannot write new class instances involving T, or include T in a type family. (I don't think there's a way around this...)
At the value level, however, the main limitation is that a consumer cannot write a type annotation including T:
> :t (f . read) :: Read b => String -> IO (A.T b)
<interactive>:1:39: Not in scope: type constructor or class `A.T'
Not prevented
The restriction on type signatures is not as significant a limitation as it appears. The compiler can still infer such a type:
> :t f . read
f . read :: Read b => String -> IO (A.T b)
Any value expression within the inferrable subset of Haskell may therefore be expressed regardless of the availability of the type constructor T. If, like me, you're addicted to ScopedTypeVariables and extensive annotations, you may be a little surprised by the definition of unT' below.
Furthermore, because typeclass instances have global scope, a consumer can use any available class functions without additional limitation. Depending on the classes involved, this may allow significant manipulation of values of the unexposed type. With classes like Functor, a consumer can also freely manipulate type parameters, because there's an available function of type T a -> T b.
In the example of T, deriving Show of course exposes the "internal" Int, and gives a consumer enough information to hackishly implement unT:
-- :: (Show a, Read a) => T a -> (Int, a)
unT' = (read . strip . show') `asTypeOf` (mkPair . g)
where
strip = reverse . drop 1 . reverse . drop 9
-- :: T a -> String
show' = show `asTypeOf` (mkString . g)
mkPair :: t -> (Int, t)
mkPair = undefined
mkString :: t -> String
mkString = undefined
> :t unT'
unT' :: (Show b, Read b) => A.T b -> (Int, b)
> x <- f "x"
> unT' x
(-29353, "x")
Implementing mkT' with the Read instance is left as an exercise.
Deriving something like Generic will completely explode any idea of containment, but you'd probably expect that.
Prevented?
In the corners of Haskell where type signatures are necessary or where asTypeOf-style tricks don't work, I guess not exporting the type constructor could actually prevent a consumer from doing something they could with the export list (f, g, T()).
Recommendation
Export all type constructors that are used in the type of any value you export. Here, go ahead and include T() in your export list. Leaving it out doesn't accomplish anything other than muddying the documentation. If you want to expose an purely abstract immutable type, use a newtype with a hidden constructor and no class instances.

Resources