Genetic Programming in Haskell - haskell

There is GenProg (http://hackage.haskell.org/package/genprog) for example, but that only deals with numerical optimization, in this case finding an equation that describes the data.
But I require for loops, if statements, when statements, Boolean checks etc. I need to be able to generate imperative structures. Any thought on this? My best options so far seem to be husk-scheme where I can run Scheme code as a DSL in Haskell. Surely there must be better ways?

If you're looking for something akin to S-expressions, this is fairly easily modeled in Haskell. Say, for example, we want to represent simple algebraic equations with variables, such as
x + 5 / (y * 2 - z)
This can be represented by a simple AST in Haskell, in particular we can implement it as
data Expr
= Lit Double -- Literal numbers
| Var Char -- Variables have single letter names
| Add Expr Expr -- We can add things together
| Sub Expr Expr -- And subtract them
| Mul Expr Expr -- Why not multiply, too?
| Div Expr Expr -- And divide
deriving (Eq)
This would let us express the equation above as
Add (Var 'x') (Div (Lit 5) (Sub (Mul (Var 'y') (Lit 2)) (Var 'z')))
But this is rather clunky to write and difficult to read. Let's start by working some Show magic so that it gets pretty printed:
instance Show Expr where
showsPrec n (Lit x) = showParen (n > 10) $ showsPrec 11 x
showsPrec n (Var x) = showParen (n > 10) $ showChar x
showsPrec n (Add x y) = showParen (n > 6) $ showsPrec 7 x . showString " + " . showsPrec 7 y
showsPrec n (Sub x y) = showParen (n > 6) $ showsPrec 7 x . showString " - " . showsPrec 7 y
showsPrec n (Mul x y) = showParen (n > 7) $ showsPrec 8 x . showString " * " . showsPrec 8 y
showsPrec n (Div x y) = showParen (n > 7) $ showsPrec 8 x . showString " / " . showsPrec 8 y
If you don't understand everything going on here, that's ok, it's a lot of complication made easy by some built in functions for efficiently building strings with parentheses in them properly and all that fun stuff. It's pretty much copied out of the docs in Text.Show. Now if we print out our expression from above, it'll look like
x + 5.0 / (y * 2.0 - z)
Now for simplifying building these expressions. Since they're pretty much numeric, we can implement Num (except for abs and signum) and Fractional:
instance Num Expr where
fromInteger = Lit . fromInteger
(+) = Add
(-) = Sub
(*) = Mul
abs = undefined
signum = undefined
instance Fractional Expr where
(/) = Div
fromRational = Lit . fromRational
Now we can input out expression from above as
Var 'x' + 5 / (Var 'y' * 2 - Var 'z')
This is at least much easier to visually parse, even if we have to specify the variables manually.
Now that we have pretty input and output, let's focus on evaluating these expressions. Since we have variables in here, we'll need some sort of environment that associates a variable with a value
import Data.Map (Map)
import qualified Data.Map as M
type Env = Map Char Double
And now it's just basic pattern matching (along with a helper function)
import Control.Applicative
binOp :: (Double -> Double -> Double) -> Expr -> Expr -> Env -> Maybe Double
binOp op x y vars = op <$> evalExpr x vars <*> evalExpr y vars
evalExpr :: Expr -> Env -> Maybe Double
evalExpr (Lit x) = const $ Just x
evalExpr (Var x) = M.lookup x
evalExpr (Add x y) = binOp (+) x y
evalExpr (Sub x y) = binOp (-) x y
evalExpr (Mul x y) = binOp (*) x y
evalExpr (Div x y) = binOp (/) x y
Now we can use evalExpr to evaluate an expression in our mini-language with variable substitution. All the error handling if there's an undefined variable is done by the Maybe monad, and we were even able to cut down on repetition by making the environment argument implicit. This is all pretty standard for a simple expression DSL.
So for the fun bit, generating random expressions and (eventually) mutations. For this, we'll need System.Random. We want to be able to generate expressions of varying complexity, so we'll express it in rough depth. Since it's random, there is a chance that we'll get shorter or deeper trees than specified. This will probably be something that you'll want to tweak and tune to get your desired results. First, because I have the foresight of having written this code already, let's define two helpers for generating a random literal and a random variable:
randomLit, randomVar :: IO Expr
randomLit = Lit <$> randomRIO (-100, 100)
randomVar = Var <$> randomRIO ('x', 'z')
Nothing exciting here, we get doubles between -100 and 100, and up to 3 variables. Now we can generate large expression trees.
generateExpr :: Int -> IO Expr
-- When the depth is 1, return a literal or a variable randomly
generateExpr 1 = do
isLit <- randomIO
if isLit
then randomLit
else randomVar
-- Otherwise, generate a tree using helper
generateExpr n = randomRIO (0, 100) >>= helper
where
helper :: Int -> IO Expr
helper prob
-- 20% chance that it's a literal
| prob < 20 = randomLit
-- 10% chance that it's a variable
| prob < 30 = randomVar
-- 15% chance of Add
| prob < 45 = (+) <$> generateExpr (n - 1) <*> generateExpr (n - 1)
-- 15% chance of Sub
| prob < 60 = (-) <$> generateExpr (n - 1) <*> generateExpr (n - 1)
-- 15% chance of Mul
| prob < 75 = (*) <$> generateExpr (n - 1) <*> generateExpr (n - 1)
-- 15% chance of Div
| prob < 90 = (/) <$> generateExpr (n - 1) <*> generateExpr (n - 1)
-- 10% chance that we generate a possibly taller tree
| otherwise = generateExpr (n + 1)
The bulk of this function is just specifying the probabilities that a given node will be generated, and then generating the left and right nodes for each operator. We even got to use the normal arithmetic operators since we overloaded Num, how handy!
Now, remember that we can still pattern match on the constructors of this Expr type for other operations, such as replacing nodes, swapping them, or measuring the depth. For this, you just have to treat it as any other binary tree type in Haskell, except it has 2 leaf constructors and 4 node constructors. As for mutations, you'll have to write code that traverses this structure and chooses when to swap out nodes and what to swap them out with. It'll live within the IO monad since you'll be generating random values, but it shouldn't be too difficult. This structure should be pretty easy to extend as need be, such as if you wanted to add trig functions and exponentiation, you'd just need more constructors, more expressions in evalExpr, and the appropriate clauses in helper, along with some probability adjustments.
You can get the full code for this example here. Hope this helps you see how to formulate something like S-expressions in Haskell.

Related

left associate an operation in haskell

I want to return the left association of an expression given (haskell), so if we have a + (b + c) the function has to return (a + b) + c, this also applies for the multiplication since these 2 operations are associative. taking onto consideration the other operations that are not associative therefor I have to put a condition to not left associate those expressions.
so the leftAssociate function turns every expression given into an equivalent left-associated one with the same constants
the function is defined as follow :
data Expr = Val Int | App Op Expr Expr
leftAssociate :: Expr -> Expr
leftAssociate (App Add (Val a) (App Add (Val b) (Val c))) = App Add (App Add (Val a) (Val b)) (Val c)
I have tried pattern matching but it's too overwhelming as there is a lot of possibilities and also I cannot specify the number of operations given as input as it's not limited.
You say
we consider it left-balances [sic] if it has no subexpressions of shape App Op x ( App Op y z) and that's only when the operation is Addition or Multiplication
I therefore propose that you structure your function for fixing this defect in the way you described:
leftAssociate (App Add x (App Add y z)) = -- ...
leftAssociate (App Mul x (App Mul y z)) = -- ...
leftAssociate (App op x y) = -- ...
leftAssociate (Val n) = -- ...
Presumably in the first three cases you will make recursive calls at some point to make sure that the x, y, and z subterms are also left-associated. This recursion will be the mechanism that allows you to handle arbitrarily large expressions.
I find it easiest to conceptualize this like so: for each subtree rooted at an App op _ _ node with an op you want to re-associate, you can collect all the terms at the top of the subtree being combined with that same App op into a flattened list, and then create a left associated tree with a foldl1 from that list. This gives the following solution:
data Expr = Val Int | App Op Expr Expr deriving (Show)
data Op = Add | Mul | Sub deriving (Show, Eq)
-- Identify which operators should be re-associated
isAssoc :: Op -> Bool
isAssoc Add = True
isAssoc Mul = True
isAssoc _ = False
leftAssociate :: Expr -> Expr
leftAssociate a#(App op _ _)
| isAssoc op = foldl1 (App op) $ opTerms a
where opTerms :: Expr -> [Expr]
opTerms (App op' x' y') | op' == op = opTerms x' ++ opTerms y'
opTerms e = [leftAssociate e]
leftAssociate (App op x y) = App op (leftAssociate x) (leftAssociate y)
leftAssociate e = e
You can technically get rid of the intermediate flattened list by constructing the left associated tree directly and defining a concatenation function for left associated trees, and I think that would give you the sort of direct recursive solution being discussed in the other answer, but I found this version easier to write.
Here's a test case:
mul = App Mul
add = App Add
sub = App Sub
ex1 = leftAssociate (add (Val 1) (sub (mul (Val 2) (mul (Val 3) (Val 4)))
(add (add (Val 5) (Val 6)) (add (Val 7) (Val 8)))))
main = print $ leftAssociate ex1
TL;DR:
instead of turning:
+
/ \
/ \
a +
/ \
/ \
b c
into
+
/ \
/ \
+ c
/ \
/ \
a b
I'd suggest to alter the tree to be non-binary:
+
/|\
/ | \
a b c
Longer Explanation
if we have a + (b + c) the function has to return (a + b) + c
I would not do that. I would choose a different representation than Expr = Val Int | App Op Expr Expr. Expr as binary tree is not associative. How about
data AssocExpr = AVal Int | AApp Op [Expr]
Lists are associative by design. You can keep both data structures around and convert to AssocExpr when you need the associativity. And if you really need it, you can still convert back and thereby achieve the initial goal.
To transform from Expr to AssocExpr, you can modify leftAssociate from K. A. Buhr's answer and eliminate the foldl1:
associate :: Expr -> AssocExpr
associate a#(App op _ _)
| isAssoc op = AApp op $ opTerms a
...
Related Codereview (from myself):
https://codereview.stackexchange.com/questions/254288/parametrize-by-type-constructor/254572#254572

Define algebra over GF4

How to define the algebraic operations over finite field power 4 (GF4) in Haskell?
I have numbers: 0, 1, 2, 3
And the operators could look like this:
(+) x y = (x + y) `mod` 4
(*) 0 y = 0
(*) 1 y = y
(*) x 0 = 0
(*) x 1 = x
(*) 2 2 = 3
(*) 3 3 = 2
(*) 2 3 = 1
(*) 3 2 = 1
Note: (*) is not multiple mod 4!
I want to get something like this:
3 * 2 :: GF4 == 1 :: GF4
I write:
class GF4 x where
(+), (*) :: x -> x -> x
instance GF4 where
0 + 0 = 0
...
2 * 3 = 1
...
But unsuccessfully! How to write an implement of this operators by type class or type?
Like this:
data GF4 = GF4_0 | GF4_1 | GF4_2 | GF4_3
deriving (Bounded, Enum, Eq, {- Ord, maybe? -} Read, Show)
instance Num GF4 where
-- A small trick to avoid having to write all the cases by hand:
-- reuse the `Num Int` instance and use the `Enum GF4` instance to
-- convert back and forth.
-- BUT note that, though this was the original question's spec for
-- (+), this is not how addition in GF4 is usually defined. Thanks
-- to chi for pointing it out. Presumably the definition for x - y
-- below also needs to be updated; perhaps defining negate instead
-- would involve less repetition.
x + y = toEnum ((fromEnum x + fromEnum y) `mod` 4)
GF4_0 * y = 0
GF4_1 * y = y
GF4_2 * GF4_2 = GF4_3
-- etc.
-- and a couple other bookkeeping functions; see :info Num or the haddocks
x - y = toEnum ((fromEnum x - fromEnum y) `mod` 4)
fromInteger n = toEnum (fromInteger (n `mod` 4))
abs = id
signum = toEnum . signum . fromEnum
Now you can try it out in ghci:
> (3 * 2 :: GF4) == (1 :: GF4)
True
Another option that makes the Num instance less tedious is to explicitly represent it as a polynomial with mod-2 coefficients. I'll pull a silly trick I've pulled a few times before to treat Bool as mod-2 numbers (with False representing 0 and True representing 1):
instance Num Bool where
(+) = (/=)
(*) = (&&)
negate = not
abs = id
signum = id
fromInteger = odd
(An aside for the Haskell experts: if the orphan instance makes you queasy, feel free to define data Bit = O | I and write out the Num instance a bit more explicitly.)
Now we define GF4 to have two fields ("fields" in the programming sense, not the number theory sense):
data GF4 = Bool :×+ Bool deriving (Eq, {- Ord, maybe? -} Read, Show)
The ×+ is supposed to be a bit of a visual pun: we'll represent ax + b as a:×+b. Now the (corrected) Num instance looks quite a bit more ordered:
instance Num GF4 where
(a:×+b) + (a':×+b') = (a + a'):×+(b + b')
(a:×+b) * (a':×+b') = (a*a' + a*b' + b*a'):×+(a*a' + b*b')
negate = id
abs = id
signum (a:×+b) = 0:×+(a*b)
fromInteger n = 0:×+fromInteger n
x :: GF4
x = 1:×+0
Unlike the previous instance, not all inhabitants of this GF4 are available as literal numbers -- only the constant polynomials. So we define an extra value, x, to give access to the non-constant polynomials. (Or you can use the constructor directly.) Now we can try out your example in ghci; what you call 2 I call x, and what you call 3 I call x+1.
> (x+1) * x == 1
True
As #WillemVanOnsem says in the comments, GF4 should be a data type, rather than a typeclass. Despite the name, they are totally different things! A typeclass is a collection of functions which are general enough that they can have similar implementations for multiple different types; a data type is nearly the reverse, in that it defines a totally new type which the users may use as they wish.
So how do you define GF4 as a data type? The ‘simplest’ way (for one definition of ‘simplest’) is to simply define it as a wrapper around Int:
newtype GF4 = GF4 Int
(Quick note: in case you haven’t run into them before, newtypes are a special kind of data type; they are used when you want to give a new name to another type by wrapping it. See e.g. LYAH for the difference between newtypes and datas.)
Now, note that (+) and (*) are members of the Num typeclass — this makes sense, since you can implement those functions for a wide range of types — so now you can write a Num instance:
instance Num GF4 where
(+) (GF4 x) (GF4 y) = GF4 ((x + y) `mod` 4)
(*) (GF4 0) (GF4 y) = GF4 0
(*) (GF4 1) (GF4 y) = GF4 y
-- and so on and so forth
-- but Num also has some other functions; let’s implement those too
negate (GF 0) = GF 0
negate (GF 1) = (GF 3)
negate (GF 2) = GF 2
-- note that a ‘negate’ implementation automatically gives you (-) as well
abs x = x
signum x = x
-- this is an unsafe function — usually you’d avoid them, but it’s the
-- only way to implement this one
fromInteger x = if 0 <= x && x < 4 then GF (fromInteger x) else error "value out of bounds!"
Then, you can export the name of the type GF4, but not the constructor GF4 :: Int -> GF4; thus outside people can use your type, but cannot construct invalid values like GF4 30.
Yet there is a better way. Note that GF4 only has four values — so it’s totally feasible to define this as an enumeration:
data GF4 = GF0 | GF1 | GF2 | GF3
This way, you can export everything, and still have it impossible by design to construct invalid values. This is considered good practice in Haskell; for this reason alone, I would use this definition rather than the newtype one. The implementation of Num is very similar to that given above; for this reason I won’t write the whole thing out again, but you should be able to easily figure it out.
You need to specify a type in our instance declaration. E.g.
instance GF4 Int where
0 + 0 = 0
2 * 3 = 1
To use it you still need to hide (+) and (*) from Prelude:
import Prelude hiding ((*), (+))
Now you can start using your GF4 instance:
one :: Int
one = 2 * 3

Haskell: Convert an expression into a list of Instructions

This is a Homework Problem.
My goal is to Convert a type Expression in the form of " into a list of CPU Instructions. Given the data structures
data Expr = Const Double | Var String | Add Expr Expr | Mult Expr Expr
data Instr = LoadImmediate RegisterNumber -- put value here
RegisterValue -- the value
| Addition RegisterNumber -- put result here
RegisterNumber -- first thing to add
RegisterNumber -- second thing to multiply
| Multiply RegisterNumber -- put result here
RegisterNumber -- first thing to multiply
RegisterNumber -- second thing to multiply
type RegisterNumber = Int
type RegisterValue = Double
Type Expression has four main functions
Const: converts a number of type double to a type expression, letting you use it.
Var: basically converts a string (i.e. "x") into a type expression letting you apply it to constant
Then then Add and Mult commands that let you add and multiply two type expressions
And we can assume that the only variable we will see is "x" and it is already in register 2. The result will arrive in register 1. There is a total of 4 registers.
So Add Const 1 (Mult (Var "x") (Const 2))
in type [Instr] would be
[LoadImmediate 3 1,
LoadImmediate 4 2,
Multiply 1 4 2,
Addition 1 3 1]
EDIT: Sorry, I for got to mention, because this is a beginner course, we only need to consider expressions of the form
(Const a) `Add` ((Var "x") `Mult` Expr)
where 'Expr' is some expression. Or in math form a0+x(a1+x(a2+x...))
I fixed my code up a little bit, now the error I'm getting is "Not in scope: data constructor 'RegNum'"
expr2Code :: Expr -> [Instr]
expr2Code expr = e2C 1 expr
e2C:: Int -> Expr -> Instr
e2C RegNum (Const y) = [LoadImmediate RegNum y]
e2C RegNum (Var "x") = []
e2C RegNum (Add expr1 expr2) = e2C 3 expr1 ++ e2C 4 expr2 ++ [Addition RegNum 3 4]
e2C RegNum (Mult expr1 expr2) = e2C 3 expr1 ++ e2C 4 expr2 ++ [Multiply RegNum 3 4]
Sorry for the long post, any help would be appreciated.
Well I'm assuming you have an infinite number of registers. If not you can experience the joy that is register spilling, but you'd need some more instructions to deal with dynamic memory.
There are 3 straightforward ways to do this
Expressions -> SSA -> Instr
Expressions -> CPS -> Instr
Expressions -> Instr
The first 2 offer much easier opportunities to optimize your use of registers and what not, but involve an intermediate language. Since we're lazy, let's do 3.
import Control.Monad.State
type Gensym = State Int
gensym :: Gensym RegisterNumber
gensym = modify (+1) >> fmap (subtract 1) get
Now that we have a way of uniquely generating registers, we can do the wonderfully inefficient approach.
withRegister :: (RegisterNumber -> [Instr]) -> Gensym (RegisterNumber, [Instr])
withRegister f = gensym >>= \r -> return (r, f r)
compile :: Expr -> Gensym (RegisterNumber, [Instr])
compile (Const d) = withRegister $ \r -> [LoadImmediate r d]
compile (Var "x") = return (2, [])
compile (Add e1 e2) = do
(t1, c1) <- compile e1 -- Recursively compile
(t2, c2) <- compile e2 -- Each subexpression like you were
withRegister $ \r -> Addition t1 t2 r : (c1 ++ c2) -- Make sure we
-- use a unique register
compile (Mult e1 e2) = do
(t1, c1) <- compile e1
(t2, c2) <- compile e2
withRegister $ \r -> Multiply t1 t2 r : (c1 ++ c2)
compileExpr :: Expr -> [Instr]
compileExpr = reverse . snd . flip evalState 3 . compile
This basically recursively compiles each expression, concatting their various chunks of code together. This is similar to what you had, but there are 3 major differences,
I'm using a monad to handle the registers for me.
You have to ensure that you never clobber a register you're going to need, by using a monad I ensure that all the registers I'm using are unique. This is really inefficient,but trivially correct.
When handling Var, I just load whatever's in register 2, since LoadImmediate wants a double and you have no mechanism for actually binding variables in your expressions.
Because you're not dealing with expressions, each chunk of computation has to be stuck in a register explicitly. You can't do x + y * z any more. That's why if you look at the code for Add or Mult, each subexpression is compiled to a fresh register.
Your example generates
[LoadImmediate 4 2.0,Multiply 2 4 5,LoadImmediate 3 1.0,Addition 3 5 6]
Which is correct, it multiplies 4 and x, then adds 3.
e2C _ (Var "x") = LoadImmediate 2 "x"
If x is already in the register 2 you don't need to load it at all. Var "x" would not translate to any load operation. rather, it translates to the operand of 2 in some other operation (addition or multiplication). For instance, (Add (Const 25) (Var "x")) would translate to [LoadImmediate 3 25, Addition 1 3 2].
e2C _ (Mult x y) = Multiply 4 (e2C _ x) (e2C _ y)
This of course does not work, as operands of Multiply are registers, not instructions. You have to translate x and note to which register rx the result goes; then translate y and note to which register ry its result goes; make sure rx != xy; and finally, issue a Multiply rz rx ry.
Now, how to determine rz, and how to make sure rx != ry? One simple strategy is to make sure each result goes to its own register (assuming there's an infinite number of them, which is of course not true for real machine architectures).
The top-level result will go to register 1.

Strange pattern matching with functions instancing Show

So I'm writing a program which returns a procedure for some given arithmetic problem, so I wanted to instance a couple of functions to Show so that I can print the same expression I evaluate when I test. The trouble is that the given code matches (-) to the first line when it should fall to the second.
{-# OPTIONS_GHC -XFlexibleInstances #-}
instance Show (t -> t-> t) where
show (+) = "plus"
show (-) = "minus"
main = print [(+),(-)]
returns
[plus,plus]
Am I just committing a mortal sin printing functions in the first place or is there some way I can get it to match properly?
edit:I realise I am getting the following warning:
Warning: Pattern match(es) are overlapped
In the definition of `show': show - = ...
I still don't know why it overlaps, or how to stop it.
As sepp2k and MtnViewMark said, you can't pattern match on the value of identifiers, only on constructors and, in some cases, implicit equality checks. So, your instance is binding any argument to the identifier, in the process shadowing the external definition of (+). Unfortunately, this means that what you're trying to do won't and can't ever work.
A typical solution to what you want to accomplish is to define an "arithmetic expression" algebraic data type, with an appropriate show instance. Note that you can make your expression type itself an instance of Num, with numeric literals wrapped in a "Literal" constructor, and operations like (+) returning their arguments combined with a constructor for the operation. Here's a quick, incomplete example:
data Expression a = Literal a
| Sum (Expression a) (Expression a)
| Product (Expression a) (Expression a)
deriving (Eq, Ord, Show)
instance (Num a) => Num (Expression a) where
x + y = Sum x y
x * y = Product x y
fromInteger x = Literal (fromInteger x)
evaluate (Literal x) = x
evaluate (Sum x y) = evaluate x + evaluate y
evaluate (Product x y) = evaluate x * evaluate y
integer :: Integer
integer = (1 + 2) * 3 + 4
expr :: Expression Integer
expr = (1 + 2) * 3 + 4
Trying it out in GHCi:
> integer
13
> evaluate expr
13
> expr
Sum (Product (Sum (Literal 1) (Literal 2)) (Literal 3)) (Literal 4)
Here's a way to think about this. Consider:
answer = 42
magic = 3
specialName :: Int -> String
specialName answer = "the answer to the ultimate question"
specialName magic = "the magic number"
specialName x = "just plain ol' " ++ show x
Can you see why this won't work? answer in the pattern match is a variable, distinct from answer at the outer scope. So instead, you'd have to write this like:
answer = 42
magic = 3
specialName :: Int -> String
specialName x | x == answer = "the answer to the ultimate question"
specialName x | x == magic = "the magic number"
specialName x = "just plain ol' " ++ show x
In fact, this is just what is going on when you write constants in a pattern. That is:
digitName :: Bool -> String
digitName 0 = "zero"
digitName 1 = "one"
digitName _ = "math is hard"
gets converted by the compiler to something equivalent to:
digitName :: Bool -> String
digitName x | x == 0 = "zero"
digitName x | x == 1 = "one"
digitName _ = "math is hard"
Since you want to match against the function bound to (+) rather than just bind anything to the symbol (+), you'd need to write your code as:
instance Show (t -> t-> t) where
show f | f == (+) = "plus"
show f | f == (-) = "minus"
But, this would require that functions were comparable for equality. And that is an undecidable problem in general.
You might counter that you are just asking the run-time system to compare function pointers, but at the language level, the Haskell programmer doesn't have access to pointers. In other words, you can't manipulate references to values in Haskell(*), only values themselves. This is the purity of Haskell, and gains referential transparency.
(*) MVars and other such objects in the IO monad are another matter, but their existence doesn't invalidate the point.
It overlaps because it treats (+) simply as a variable, meaning on the RHS the identifier + will be bound to the function you called show on.
There is no way to pattern match on functions the way you want.
Solved it myself with a mega hack.
instance (Num t) => Show (t -> t-> t) where
show op =
case (op 6 2) of
8 -> "plus"
4 -> "minus"
12 -> "times"
3 -> "divided"

Symbolic simplification in Haskell (using recursion?)

How can I give a general rule that includes all the expressions below?
E.g one expression, another one for sub and one for mult.
I need to use recursion but i got confused...
simplify :: Expr->Expr
simplify (Mult (Const 0)(Var"x"))
= Const 0
simplify (Mult (Var "x") (Const 0))
= Const 0
simplify (Plus (Const 0) (Var "x"))
= Var "x"
simplify (Plus (Var "x") (Const 0))
= Var "x"
simplify (Mult (Const 1) (Var"x"))
= Var "x"
simplify (Mult(Var"x") (Const 1))
= Var "x"
simplify (Minus (Var"x") (Const 0))
= Var "x"
simplify (Plus (Const x) (Const y))
= Const (x + y)
simplify (Minus (Const x) (Const y))
= Const (x - y)
simplify (Mult (Const x) (Const y))
= Const (x * y)
simplify x = x
First up: I know reasonably little about Haskell, and my total time spent programming the language is no more than 8 hours spread over 5 years or so, though I have read plenty about the language. Thus, forgive my no doubt horrible style.
I tackled this problem since it looked like an easy way to get into a little bit of Haskell programming. First up, I made a data type inspired by the sample:
data Expr = Const Int | Mult Expr Expr | Plus Expr Expr | Var String
I made it a little simpler than the original, and left out Minus, but otherwise it's the same.
I quickly found out that values constructed using e.g. "Const 4" were not printable with the above, as there was no show function applicable. I made Expr an instance of the Show type class, and provided a simple definition of show, taking operator precedence into account:
instance Show Expr where
show (Const n) = show n
show (Var n) = show n
show (Plus a b) = (show a) ++ "+" ++ (show b)
show (Mult a b) = "(" ++ (show a) ++ ") * (" ++ (show b) ++ ")"
Next up was the simplification task itself. As Glomek hints, there's an issue with trying to evaluate everything just using pattern matching in one function.
Specifically, for any given operation (all operations in the example are binary) you'd like to first simplify the left tree, then the right tree, and then simplify the current Expr based on what those subtrees evaluated to; e.g. if both simplified to Const, then you can replace the entire subtree with the evaluated operation. However, pattern matching forces you to choose what to do based on the immediate node's children, rather than what the subtrees return after being simplified themselves.
Thus, to get pattern-matching to do the job of deciding whether to evaluate the current node or not as a constant subexpression, it's important to simplify the subtrees, then dispatch based on the simplified whole.
I did this using a separate function I called eval, whose purpose is to pattern-match on things that can be reduced, assuming that subtrees have already been reduced. It also handles multiplication by 0 and 1, and addition of 0:
-- Tries to evaluate any constant expressions.
eval :: Expr -> Expr
eval (Mult (Const a) (Const b)) = Const (a * b)
eval (Mult (Const a) b)
| a == 0 = Const 0
| a == 1 = b
| otherwise = (Mult (Const a) b)
eval (Mult a (Const b))
| b == 0 = Const 0
| b == 1 = a
| otherwise = (Mult a (Const b))
eval (Plus (Const a) (Const b)) = Const (a + b)
eval (Plus (Const a) b)
| a == 0 = b
| otherwise = (Plus (Const a) b)
eval (Plus a (Const b))
| b == 0 = a
| otherwise = (Plus a (Const b))
eval e = e
Now that I have eval, and I know it's safe to call at the top level of an expression tree (i.e. it won't infinitely recurse), I can call it from simplify after I've simplified the subtrees:
-- Tries to match evaluation rules after simplifying subtrees.
simplify :: Expr -> Expr
simplify (Plus a b) = eval (Plus (simplify a) (simplify b))
simplify (Mult a b) = eval (Mult (simplify a) (simplify b))
simplify e = e
This version of simplify has many limitations: it won't distribute a multiplication over a non-Const subtree, it won't reorder the expression to bring constant expressions together (so the equivalent of 1+a+2 won't get simplified to a+3), etc. However, it gets the basic jobs done.
The recursion comes in when you need to deal with nested expressions. For instance, how do you simply (Plus (Plus 2 3) (Plus 4 5))?
One approach is to break it into two functions. Move the one level logic (which you show above) into its own function. The main simplify function might have a rule similar to the following for Plus:
simplify (Plus x y) = simplify_one_level (Plus (simplify x) (simplify y))

Resources