Why are ML/Haskell datatypes useful for defining "languages" like arithmetic expressions? - haskell

This is more of a soft question about static type systems in functional languages like those of the ML family. I understand why you need datatypes to describe data structures like lists and trees but defining "expressions" like those of propositional logic within datatypes seems to bring just some convenience and is not necessary. For example
datatype arithmetic_exp = Constant of int
| Neg of arithmetic_exp
| Add of (arithmetic_exp * arithmetic_exp)
| Mult of (arithmetic_exp * arithmetic_exp)
defines a set of values, on which you can write an eval function which would give you the result. You could just as well define 4 functions: const: int -> int, neg: int -> int, add: int * int -> int and mult: int * int -> int and then an expression of the sort add (mult (const 3, neg 2), neg 4) would give you the same thing without any loss of static security. The only complication is that you have to do four things instead of two. While learning SML and Haskell I've been trying to think about which features give you something necessary and which are just a convenience, so this is the reason why I'm asking. I guess this would matter if you want to decouple the process of evaluating a value from the value itself but I'm not sure where that would be useful.
Thanks a lot.

There is a duality between initial / first-order / datatype-based encodings (aka deep embeddings) and final / higher-order / evaluator-based encodings (aka shallow embeddings). You can indeed typically use a typeclass of combinators instead of a datatype (and convert back and forth between the two).
Here is a module showing the two approaches:
{-# LANGUAGE GADTs, Rank2Types #-}
module Expr where
data Expr where
Val :: Int -> Expr
Add :: Expr -> Expr -> Expr
class Expr' a where
val :: Int -> a
add :: a -> a -> a
You can see that the two definitions look eerily similar. Expr' a is basically describing an algebra on Expr which means that you can get an a out of an Expr if you have such an Expr' a. Similarly, because you can write an instance Expr' Expr, you're able to reify a term of type forall a. Expr' a => a into a syntactic value of type Expr:
expr :: Expr' a => Expr -> a
expr e = case e of
Val n -> val n
Add p q -> add (expr p) (expr q)
instance Expr' Expr where
val = Val
add = Add
expr' :: (forall a. Expr' a => a) -> Expr
expr' e = e
In the end, picking a representation over another really depends on what your main focus is: if you want to inspect the structure of the expression (e.g. if you want to optimise / compile it), it's easier if you have access to an AST. If, on the other hand, you're only interested in computing an invariant using a fold (e.g. the depth of the expression or its evaluation), a higher order encoding will do.

The ADT is in a form you can inspect and manipulate in ways other than simply evaluating it. Once you hide all the interesting data in a function call, there is no longer anything you can do with it but evaluate it. Consider this definition, similar to the one in your question, but with a Var term to represent variables and with the Mul and Neg terms removed to focus on addition.
data Expr a = Constant a
| Add (Expr a) (Expr a)
| Var String
deriving Show
The obvious function to write is eval, of course. It requires a way to look up a variable's value, and is straightforward:
-- cheating a little bit by assuming all Vars are defined
eval :: Num a => Expr a -> (String -> a) -> a
eval (Constant x) _env = x
eval (Add x y) env = eval x env + eval y env
eval (Var x) env = env x
But suppose you don't have a variable mapping yet. You have a large expression that you will evaluate many times, for different choices of variable. And some silly recursive function built up an expression like:
Add (Constant 1)
(Add (Constant 1)
(Add (Constant 1)
(Add (Constant 1)
(Add (Constant 1)
(Add (Constant 1)
(Var "x"))))))
It would be wasteful to recompute 1+1+1+1+1+1 every time you evaluate this: wouldn't it be nice if your evaluator could realize that this is just another way of writing Add (Constant 6) (Var "x")?
So, you write an expression optimizer, which runs before any variables are available and tries to simplify expressions. There are many simplification rules you could apply, of course; below I've implemented just two very easy ones to illustrate the point.
simplify :: Num a => Expr a -> Expr a
simplify (Add (Constant x) (Constant y)) = Constant $ x + y
simplify (Add (Constant x) (Add (Constant y) z)) = simplify $ Add (Constant $ x + y) z
simplify x = x
Now how does our silly expression look?
> simplify $ Add (Constant 1) (Add (Constant 1) (Add (Constant 1) (Add (Constant 1) (Add (Constant 1) (Add (Constant 1) (Var "x"))))))
Add (Constant 6) (Var "x")
All the unnecessary stuff has been removed, and you now have a nice clean expression to try for various values of x.
How do you do the same thing with a representation of this expression in functions? You can't, because there is no "intermediate form" between the initial specification of the expression and its final evaluation: you can only look at the expression as a single, opaque function call. Evaluating it at a particular value of x necessarily evaluates each of the subexpressions anew, and there is no way to disentangle them.
Here's an extension of the functional type you propose in your question, again enriched with variables:
type FExpr a = (String -> a) -> a
lit :: a -> FExpr a
lit x _env = x
add :: Num a => FExpr a -> FExpr a -> FExpr a
add x y env = x env + y env
var :: String -> FExpr a
var x env = env x
with the same silly expression to evaluate many times:
sample :: Num a => FExpr a
sample = add (lit 1)
(add (lit 1)
(add (lit 1)
(add (lit 1)
(add (lit 1)
(add (lit 1)
(var "x"))))))
It works as expected:
> sample $ \_var -> 5
11
But it has to do a bunch of addition every time you try it for a different x, even though the addition and the variable are mostly unrelated. And there's no way you can simplify the expression tree. You can't simplify it while defining it: that is, you can't make add smarter, because it can't inspect its arguments at all: its arguments are functions which, as far as add is concerned, could do anything at all. And you can't simplify it after constructing it, either: at that point you just have an opaque function that takes in a variable lookup function and produces a value.
By modeling the important parts of your problem as data types in their own right, you make them values that your program can manipulate intelligently. If you leave them as functions, you get a shorter program that is less powerful, because you lock all the information inside lambdas that only GHC can manipulate.
And once you've written it with ADTs, it's not hard to collapse that representation back into the shorter function-based representation if you want. That is, it might be nice to have a function of type
convert :: Expr a -> FExpr a
But in fact, we've already done this! That's exactly the type that eval has. You just might not have noticed because of the FExpr type alias, which is not used in the definition of eval.
So in a way, the ADT representation is more general and more powerful, acting as a tree that you can fold up in many different ways. One of those ways is evaluating it, in exactly the way that the function-based representation does. But there are others:
Simplify the expression before evaluating it
Produce a list of all the variables that must be defined for this expression to be well formed
Count how deeply nested the deepest part of the tree is, to estimate how many stack frames an evaluator might need
Convert the expression to a String approximating a Haskell expression you could type to get the same result
So if possible you'd like to work with information-rich ADTs for as long as you can, and then eventually fold the tree up into a more compact form once you have something specific to do with it.

Related

a way to simulate Union types without type operators

I have done this before using type operators, but I want to exclude those because I want to be able to do it with a smaller hammer, because I actually want to do it in another language, and I'm not too sure type operators do quite what I want.
The setup is two data types, Integer and...
> data Rational = Rational Integer Integer deriving Show
two type classes with sensible instances...
> class Divide2 a where
> divide2 :: a -> a
> class Increment a where
> inc :: a -> a
> instance Increment Main.Rational where
> inc (Rational a b) = Rational (a + b) b
> instance Divide2 Main.Rational where
> divide2 (Rational a b) = Rational a (2 * b)
> instance Increment Integer where
> inc x = x + 1
I can define things that work instances of one type class or the other
> div4 x = divide2 (divide2 x)
> add2 :: Increment c => c -> c
> add2 = inc . inc
and then I want to take the union of these two data types...so the obvious thing to do is use a discriminated
> data Number = Rat Main.Rational | Int Integer
now...in my scenario, the functions that act on this union, exist in one distinct module (a binary, I'm not familiar with Haskells binaries)
but the data types themselves are defined in another
So clearly I can define some functions that (in principle) can potentially "work" on this union, e.g. functions that act on values of instances of Increment....and some that don't, e.g. one in Divide2
So how do I write a function, against this discriminated union, that applys a function to values in the union, that will compile for functions on Increment, but don't compile on functions on Divide2...I'll have a go here, and fall flat on my face.
> apply (Rat r) f = f r
> apply (Int i) f = f i
.
• Couldn't match expected type ‘Main.Rational’
with actual type ‘Integer’
• In the first argument of ‘f’, namely ‘i’
In the expression: f i
In an equation for ‘apply’: apply (Int i) f = f i
|
86 | > apply (Int i) f = f i | ^
Failed, no modules loaded.
as expected, the inference says its got to be an Rational because of the first call, but its an Integer...
but "clearly"...if I could make haskell suspect disbelief...like some sort of macro...then the function
> apply (Int 1) add2
does make sense, and moreover, makes sense for any value of Number I care to choose.
so the obvious thing to do is to make Number a member of anything in the intersection of the set of type classes each member is in....
> instance Increment Number where
> inc (Rat r) = inc (Rat r)
> inc (Int i) = inc (Int i)
and then ghc implements "apply" itself...I CAN as well map this solution back into other languages by some explicit dictionary passing...but I have hundreds, if not thousands of tiny typeclasses (I may even have to consider all their combinations as well).
so really I want to know is there some type magic (existential? rankn?) that means that I CAN write "apply" against Number, without resorting to some dependent type magic, or have to implement instances of type classes on the discriminated union.
P.S. I can do limited dependent type magic...but its a last resort,
Edit...
The code that contains the functions defined on Number can of course match the disciminated values, but if they do, then whenever union is extended, they will fail to compile (ok, they don't have to match each case individally, but unless they do, they can't extract the wrapped value to apply the function, because it wont know the type)
Hmmm...written down it looks like the expression problem, in fact it IS the expression problem...I know of many solutions then...I just don't usually like any of them...let me knock up the canonical Haskell solution to this using type classes.
You can accept only functions that make use of Increment methods (and do not make use of any non-Incremental functionality) like this:
{-# LANGUAGE RankNTypes #-}
apply :: (forall a. Increment a => a -> a) -> Number -> Number
apply f (Rat r) = Rat (f r)
apply f (Int i) = Int (f i)
You can now pass add2 to apply if you like:
> apply add2 (Rat (Rational 3 4))
Rat (Rational 11 4)
In this specific case, implementing apply amounts to exactly the same thing as supplying an Increment instance for Number itself:
instance Increment Number where
inc (Rat r) = Rat (inc r)
inc (Int i) = Int (inc i)
...and now you don't even need the mediating apply function to apply add2:
> add2 (Rat (Rational 3 4))
Rat (Rational 11 4)
But this is a pretty special case; it won't always be so easy to just implement the appropriate classes for Number, and you will need to resort to something like the higher-rank types we used in apply instead.
So this IS the expression problem, so type classes solve this specific case.
you take the function you want to make general over some as yet undefined universe of types
> class Add2 a where
> add2' :: a -> a
> newtype Number' a = Number' a
> instance (Increment a) => Add2 (Number' a) where
> add2' (Number' x) = Number' $ inc (inc x)
> three = add2 (Int 1)
and then make any type that inhabits the required preconditions in terms of type classes, inhabit the typeclass for your generalised "function".
you then implement your new "Number" data types, and create instances of them where they make sense.

Mathmatic AST in Haskell

I am currently trying to write an AST in Haskell. More specifically, I have a parser that converts text to AST and then I want to be able to simplify an AST into another AST.
For example x + x + x
-> Add (Add (Variable 'x') (Variable 'x')) (Variable 'x')
-> (Mul (Literal 3) (Variable 'x'))
-> 3x
I have found other examples but none that take into account different data types. I want to use this approach to allow simplification rules depending on what the inner type of the left and right side of a binary expression is.
Here is roughly what I have so far for my datatypes:
data UnaryExpression o = Literal o
| Variable Char
data BinaryExpression l lo r ro = Add (l lo) (r ro)
| Mul (l lo) (r ro)
| Exp (l lo) (r ro)
-- etc...
I think I have 2 problems:
First, I need to have the correct data structure, and being new to Haskell, I am not sure what is the correct approach.
Second, I need to have my simplify function that is aware of the left and right datatypes. I feel like there should be a way to do this, but I am not sure.
So I think what you actually want is something like this:
AST o should be a mathematical expression representing a value of numerical type o.
This can be either a literal of type o, or a binary expression containing expressions that represent more specialised number types than o (e.g. Int being more specialised than Double).
First, always keep it simple and avoid duplication, so we should only have one constructor in AST for all binary operators. For distinguishing between different operators, make a separate variant type:
data NumOperator = Addition | Multiplication | Exponentiation
Then, you need to have some way what you mean by “more specialised number type”. Haskell has a bunch of numerical classes, but no standard notion of which types are more general than which others. One library for that implements this is convertible, but it's a bit too liberal “convert anything into anything else regardless of whether it's semantically clear how”. Here a simple version:
{-# LANGUAGE MultiParamTypeClasses #-}
class ConvertNum a b where
convertNum :: a -> b
instance ConvertNum Int Int where convertNum = id
instance ConvertNum Double Double where convertNum = id
...
instance ConvertNum Int Double where convertNum = fromIntegral
...
Then, you need a way to store different types in the binary-operator constructor. This is existential quantification, best expressed with a GADT:
{-# LANGUAGE GADTs #-}
data AST o where
Literal :: o -> AST o
Variable :: String -> AST o
BinaryExpression :: (ConvertNum ol o, ConvertNum or o)
=> NumOperator -> AST ol -> AST or -> AST o

Getting all function arguments in haskel as list

Is there a way in haskell to get all function arguments as a list.
Let's supose we have the following program, where we want to add the two smaller numbers and then subtract the largest. Suppose, we can't change the function definition of foo :: Int -> Int -> Int -> Int. Is there a way to get all function arguments as a list, other than constructing a new list and add all arguments as an element of said list? More importantly, is there a general way of doing this independent of the number of arguments?
Example:
module Foo where
import Data.List
foo :: Int -> Int -> Int -> Int
foo a b c = result!!0 + result!!1 - result!!2 where result = sort ([a, b, c])
is there a general way of doing this independent of the number of arguments?
Not really; at least it's not worth it. First off, this entire idea isn't very useful because lists are homogeneous: all elements must have the same type, so it only works for the rather unusual special case of functions which only take arguments of a single type.
Even then, the problem is that “number of arguments” isn't really a sensible concept in Haskell, because as Willem Van Onsem commented, all functions really only have one argument (further arguments are actually only given to the result of the first application, which has again function type).
That said, at least for a single argument- and final-result type, it is quite easy to pack any number of arguments into a list:
{-# LANGUAGE FlexibleInstances #-}
class UsingList f where
usingList :: ([Int] -> Int) -> f
instance UsingList Int where
usingList f = f []
instance UsingList r => UsingList (Int -> r) where
usingList f a = usingList (f . (a:))
foo :: Int -> Int -> Int -> Int
foo = usingList $ (\[α,β,γ] -> α + β - γ) . sort
It's also possible to make this work for any type of the arguments, using type families or a multi-param type class. What's not so simple though is to write it once and for all with variable type of the final result. The reason being, that would also have to handle a function as the type of final result. But then, that could also be intepreted as “we still need to add one more argument to the list”!
With all respect, I would disagree with #leftaroundabout's answer above. Something being
unusual is not a reason to shun it as unworthy.
It is correct that you would not be able to define a polymorphic variadic list constructor
without type annotations. However, we're not usually dealing with Haskell 98, where type
annotations were never required. With Dependent Haskell just around the corner, some
familiarity with non-trivial type annotations is becoming vital.
So, let's take a shot at this, disregarding worthiness considerations.
One way to define a function that does not seem to admit a single type is to make it a method of a
suitably constructed class. Many a trick involving type classes were devised by cunning
Haskellers, starting at least as early as 15 years ago. Even if we don't understand their
type wizardry in all its depth, we may still try our hand with a similar approach.
Let us first try to obtain a method for summing any number of Integers. That means repeatedly
applying a function like (+), with a uniform type such as a -> a -> a. Here's one way to do
it:
class Eval a where
eval :: Integer -> a
instance (Eval a) => Eval (Integer -> a) where
eval i = \y -> eval (i + y)
instance Eval Integer where
eval i = i
And this is the extract from repl:
λ eval 1 2 3 :: Integer
6
Notice that we can't do without explicit type annotation, because the very idea of our approach is
that an expression eval x1 ... xn may either be a function that waits for yet another argument,
or a final value.
One generalization now is to actually make a list of values. The science tells us that
we may derive any monoid from a list. Indeed, insofar as sum is a monoid, we may turn arguments to
a list, then sum it and obtain the same result as above.
Here's how we can go about turning arguments of our method to a list:
class Eval a where
eval2 :: [Integer] -> Integer -> a
instance (Eval a) => Eval (Integer -> a) where
eval2 is i = \j -> eval2 (i:is) j
instance Eval [Integer] where
eval2 is i = i:is
This is how it would work:
λ eval2 [] 1 2 3 4 5 :: [Integer]
[5,4,3,2,1]
Unfortunately, we have to make eval binary, rather than unary, because it now has to compose two
different things: a (possibly empty) list of values and the next value to put in. Notice how it's
similar to the usual foldr:
λ foldr (:) [] [1,2,3,4,5]
[1,2,3,4,5]
The next generalization we'd like to have is allowing arbitrary types inside the list. It's a bit
tricky, as we have to make Eval a 2-parameter type class:
class Eval a i where
eval2 :: [i] -> i -> a
instance (Eval a i) => Eval (i -> a) i where
eval2 is i = \j -> eval2 (i:is) j
instance Eval [i] i where
eval2 is i = i:is
It works as the previous with Integers, but it can also carry any other type, even a function:
(I'm sorry for the messy example. I had to show a function somehow.)
λ ($ 10) <$> (eval2 [] (+1) (subtract 2) (*3) (^4) :: [Integer -> Integer])
[10000,30,8,11]
So far so good: we can convert any number of arguments into a list. However, it will be hard to
compose this function with the one that would do useful work with the resulting list, because
composition only admits unary functions − with some trickery, binary ones, but in no way the
variadic. Seems like we'll have to define our own way to compose functions. That's how I see it:
class Ap a i r where
apply :: ([i] -> r) -> [i] -> i -> a
apply', ($...) :: ([i] -> r) -> i -> a
($...) = apply'
instance Ap a i r => Ap (i -> a) i r where
apply f xs x = \y -> apply f (x:xs) y
apply' f x = \y -> apply f [x] y
instance Ap r i r where
apply f xs x = f $ x:xs
apply' f x = f [x]
Now we can write our desired function as an application of a list-admitting function to any number
of arguments:
foo' :: (Num r, Ord r, Ap a r r) => r -> a
foo' = (g $...)
where f = (\result -> (result !! 0) + (result !! 1) - (result !! 2))
g = f . sort
You'll still have to type annotate it at every call site, like this:
λ foo' 4 5 10 :: Integer
-1
− But so far, that's the best I can do.
The more I study Haskell, the more I am certain that nothing is impossible.

Typed abstract syntax and DSL design in Haskell

I'm designing a DSL in Haskell and I would like to have an assignment operation. Something like this (the code below is just for explaining my problem in a limited context, I didn't have type checked Stmt type):
data Stmt = forall a . Assign String (Exp a) -- Assignment operation
| forall a. Decl String a -- Variable declaration
data Exp t where
EBool :: Bool -> Exp Bool
EInt :: Int -> Exp Int
EAdd :: Exp Int -> Exp Int -> Exp Int
ENot :: Exp Bool -> Exp Bool
In the previous code, I'm able to use a GADT to enforce type constraints on expressions. My problem is how can I enforce that the left hand side of an assignment is: 1) Defined, i.e., a variable must be declared before it is used and 2) The right hand side must have the same type of the left hand side variable?
I know that in a full dependently typed language, I could define statements indexed by some sort of typing context, that is, a list of defined variables and their type. I believe that this would solve my problem. But, I'm wondering if there is some way to achieve this in Haskell.
Any pointer to example code or articles is highly appreciated.
Given that my work focuses on related issues of scope and type safety being encoded at the type-level, I stumbled upon this old-ish question whilst googling around and thought I'd give it a try.
This post provides, I think, an answer quite close to the original specification. The whole thing is surprisingly short once you have the right setup.
First, I'll start with a sample program to give you an idea of what the end result looks like:
program :: Program
program = Program
$ Declare (Var :: Name "foo") (Of :: Type Int)
:> Assign (The (Var :: Name "foo")) (EInt 1)
:> Declare (Var :: Name "bar") (Of :: Type Bool)
:> increment (The (Var :: Name "foo"))
:> Assign (The (Var :: Name "bar")) (ENot $ EBool True)
:> Done
Scoping
In order to ensure that we may only assign values to variables which have been declared before, we need a notion of scope.
GHC.TypeLits provides us with type-level strings (called Symbol) so we can very-well use strings as variable names if we want. And because we want to ensure type safety, each variable declaration comes with a type annotation which we will store together with the variable name. Our type of scopes is therefore: [(Symbol, *)].
We can use a type family to test whether a given Symbol is in scope and return its associated type if that is the case:
type family HasSymbol (g :: [(Symbol,*)]) (s :: Symbol) :: Maybe * where
HasSymbol '[] s = 'Nothing
HasSymbol ('(s, a) ': g) s = 'Just a
HasSymbol ('(t, a) ': g) s = HasSymbol g s
From this definition we can define a notion of variable: a variable of type a in scope g is a symbol s such that HasSymbol g s returns 'Just a. This is what the ScopedSymbol data type represents by using an existential quantification to store the s.
data ScopedSymbol (g :: [(Symbol,*)]) (a :: *) = forall s.
(HasSymbol g s ~ 'Just a) => The (Name s)
data Name (s :: Symbol) = Var
Here I am purposefully abusing notations all over the place: The is the constructor for the type ScopedSymbol and Name is a Proxy type with a nicer name and constructor. This allows us to write such niceties as:
example :: ScopedSymbol ('("foo", Int) ': '("bar", Bool) ': '[]) Bool
example = The (Var :: Name "bar")
Statements
Now that we have a notion of scope and of well-typed variables in that scope, we can start considering the effects Statements should have. Given that new variables can be declared in a Statement, we need to find a way to propagate this information in the scope. The key hindsight is to have two indices: an input and an output scope.
To Declare a new variable together with its type will expand the current scope with the pair of the variable name and the corresponding type.
Assignments on the other hand do not modify the scope. They merely associate a ScopedSymbol to an expression of the corresponding type.
data Statement (g :: [(Symbol, *)]) (h :: [(Symbol,*)]) where
Declare :: Name s -> Type a -> Statement g ('(s, a) ': g)
Assign :: ScopedSymbol g a -> Exp g a -> Statement g g
data Type (a :: *) = Of
Once again we have introduced a proxy type to have a nicer user-level syntax.
example' :: Statement '[] ('("foo", Int) ': '[])
example' = Declare (Var :: Name "foo") (Of :: Type Int)
example'' :: Statement ('("foo", Int) ': '[]) ('("foo", Int) ': '[])
example'' = Assign (The (Var :: Name "foo")) (EInt 1)
Statements can be chained in a scope-preserving way by defining the following GADT of type-aligned sequences:
infixr 5 :>
data Statements (g :: [(Symbol, *)]) (h :: [(Symbol,*)]) where
Done :: Statements g g
(:>) :: Statement g h -> Statements h i -> Statements g i
Expressions
Expressions are mostly unchanged from your original definition except that they are now scoped and a new constructor EVar lets us dereference a previously-declared variable (using ScopedSymbol) giving us an expression of the appropriate type.
data Exp (g :: [(Symbol,*)]) (t :: *) where
EVar :: ScopedSymbol g a -> Exp g a
EBool :: Bool -> Exp g Bool
EInt :: Int -> Exp g Int
EAdd :: Exp g Int -> Exp g Int -> Exp g Int
ENot :: Exp g Bool -> Exp g Bool
Programs
A Program is quite simply a sequence of statements starting in the empty scope. We use, once more, an existential quantification to hide the scope we end up with.
data Program = forall h. Program (Statements '[] h)
It is obviously possible to write subroutines in Haskell and use them in your programs. In the example, I have the very simple increment which can be defined like so:
increment :: ScopedSymbol g Int -> Statement g g
increment v = Assign v (EAdd (EVar v) (EInt 1))
I have uploaded the whole code snippet together with the right LANGUAGE pragmas and the examples listed here in a self-contained gist. I haven't however included any comments there.
You should know that your goals are quite lofty. I don't think you will get very far treating your variables exactly as strings. I'd do something slightly more annoying to use, but more practical. Define a monad for your DSL, which I'll call M:
newtype M a = ...
data Exp a where
... as before ...
data Var a -- a typed variable
assign :: Var a -> Exp a -> M ()
declare :: String -> a -> M (Var a)
I'm not sure why you have Exp a for assignment and just a for declaration, but I reproduced that here. The String in declare is just for cosmetics, if you need it for code generation or error reporting or something -- the identity of the variable should really not be tied to that name. So it's usually used as
myFunc = do
foobar <- declare "foobar" 42
which is the annoying redundant bit. Haskell doesn't really have a good way around this (though depending on what you're doing with your DSL, you may not need the string at all).
As for the implementation, maybe something like
data Stmt = forall a. Assign (Var a) (Exp a)
| forall a. Declare (Var a) a
data Var a = Var String Integer -- string is auxiliary from before, integer
-- stores real identity.
For M, we need a unique supply of names and a list of statements to output.
newtype M a = M { runM :: WriterT [Stmt] (StateT Integer Identity a) }
deriving (Functor, Applicative, Monad)
Then the operations as usually fairly trivial.
assign v a = M $ tell [Assign v a]
declare name a = M $ do
ident <- lift get
lift . put $! ident + 1
let var = Var name ident
tell [Declare var a]
return var
I've made a fairly large DSL for code generation in another language using a fairly similar design, and it scales well. I find it a good idea to stay "near the ground", just doing solid modeling without using too many fancy type-level magical features, and accepting minor linguistic annoyances. That way Haskell's main strength -- it's ability to abstract -- can still be used for code in your DSL.
One drawback is that everything needs to be defined within a do block, which can be a hinderance to good organization as the amount of code grows. I'll steal declare to show a way around that:
declare :: String -> M a -> M a
used like
foo = declare "foo" $ do
-- actual function body
then your M can have as a component of its state a cache from names to variables, and the first time you use a declaration with a certain name you render it and put it in a variable (this will require a bit more sophisticated monoid than [Stmt] as the target of your Writer). Later times you just look up the variable. It does have a rather floppy dependence on uniqueness of names, unfortunately; an explicit model of namespaces can help with that but never eliminate it entirely.
After seeing all the code by #Cactus and the Haskell suggestions by #luqui, I've managed to got a solution close to what I want in Idris. The complete code is available at the following gist:
(https://gist.github.com/rodrigogribeiro/33356c62e36bff54831d)
Some little things I need to fix in the previous solution:
I don't know (yet) if Idris support integer literal overloading, what would be quite useful to build my DSL.
I've tried to define in DSL syntax a prefix operator for program variables, but it didn't worked as I like. I've got a solution (in the previous gist) that uses a keyword --- use --- for variable access.
I'll check this minor points with guys in Idris #freenode channel to see if these two points are possible.

How to work around F#'s type system

In Haskell, you can use unsafeCoerce to override the type system. How to do the same in F#?
For example, to implement the Y-combinator.
I'd like to offer a different solution, based on embedding the untyped lambda calculus in a typed functional language. The idea is to create a data type that allows us to change between types α and α → α, which subsequently allows to escape the restrictions of a type system. I'm not very familiar with F# so I'll give my answer in Haskell, but I believe it could be adapted easily (perhaps the only complication could be F#'s strictness).
-- | Roughly represents morphism between #a# and #a -> a#.
-- Therefore we can embed a arbitrary closed λ-term into #Any a#. Any time we
-- need to create a λ-abstraction, we just nest into one #Any# constructor.
--
-- The type parameter allows us to embed ordinary values into the type and
-- retrieve results of computations.
data Any a = Any (Any a -> a)
Note that the type parameter isn't significant for combining terms. It just allows us to embed values into our representation and extract them later. All terms of a particular type Any a can be combined freely without restrictions.
-- | Embed a value into a λ-term. If viewed as a function, it ignores its
-- input and produces the value.
embed :: a -> Any a
embed = Any . const
-- | Extract a value from a λ-term, assuming it's a valid value (otherwise it'd
-- loop forever).
extract :: Any a -> a
extract x#(Any x') = x' x
With this data type we can use it to represent arbitrary untyped lambda terms. If we want to interpret a value of Any a as a function, we just unwrap its constructor.
First let's define function application:
-- | Applies a term to another term.
($$) :: Any a -> Any a -> Any a
(Any x) $$ y = embed $ x y
And λ abstraction:
-- | Represents a lambda abstraction
l :: (Any a -> Any a) -> Any a
l x = Any $ extract . x
Now we have everything we need for creating complex λ terms. Our definitions mimic the classical λ-term syntax, all we do is using l to construct λ abstractions.
Let's define the Y combinator:
-- λf.(λx.f(xx))(λx.f(xx))
y :: Any a
y = l (\f -> let t = l (\x -> f $$ (x $$ x))
in t $$ t)
And we can use it to implement Haskell's classical fix. First we'll need to be able to embed a function of a -> a into Any a:
embed2 :: (a -> a) -> Any a
embed2 f = Any (f . extract)
Now it's straightforward to define
fix :: (a -> a) -> a
fix f = extract (y $$ embed2 f)
and subsequently a recursively defined function:
fact :: Int -> Int
fact = fix f
where
f _ 0 = 1
f r n = n * r (n - 1)
Note that in the above text there is no recursive function. The only recursion is in the Any data type, which allows us to define y (which is also defined non-recursively).
In Haskell, unsafeCoerce has the type a -> b and is generally used to assert to the compiler that the thing being coerced actually has the destination type and it's just that the type-checker doesn't know it.
Another, less common use, is to reinterpret a pattern of bits as another type. For example an unboxed Double# could be reinterpreted as an unboxed Int64#. You have to be sure about the underlying representations for this to be safe.
In F#, the first application can be achieved with box |> unbox as John Palmer said in a comment on the question. If possible use explicit type arguments to make sure that you don't accidentally have the wrong coercion inferred, e.g. box<'a> |> unbox<'b> where 'a and 'b are type variables or concrete types that are already in scope in your code.
For the second application, look at the BitConverter class for specific conversions of bit-patterns. In theory you could also do something like interfacing with unmanaged code to achieve this, but that seems very heavyweight.
These techniques won't work for implementing the Y combinator because the cast is only valid if the runtime objects actually do have the target type, but with the Y combinator you actually need to call the same function again but with a different type. For this you need the kinds of encoding tricks mentioned in the question John Palmer linked to.

Resources