haskell - types - functions - trees - haskell

For haskell practice I want to implement a game where students/pupils should learn some algebra playfully.
As basic datatype I want to use a tree:
with nodes that have labels and algebraic operators stored.
with leaves that have labels and variables (type String) or numbers
Now I want to define something like
data Tree = Leaf {l :: Label, val :: Expression}
| Node {l :: Label, f :: Fun, lBranch :: Tree, rBranch :: Tree}
data Fun = "one of [(+),(*),(-),(/),(^)]"
-- type Fun = Int -> Int
would work
Next things I think about is to make a 'equivalence' of trees - as multiplication/addition is commutative and one can simplify additions to multiplication etc. the whole bunch of algebraic operations.
I also have to search through the tree - by label I think is best, is this a good approach.
Any ideas what tags/phrases to look for and how to solve the "data Fun".

To expand a bit on Edward Z. Yang's answer:
The simplest way to define your operators here is probably as a data type, along with the types for atomic values in leaf nodes and the expression tree as a whole:
data Fun = Add | Mul | Sub | Div | Exp deriving (Eq, Ord, Show)
data Val a = Lit a | Var String deriving (Eq, Ord, Show)
data ExprTree a = Node String Fun (ExprTree a) (ExprTree a)
| Leaf String (Val a)
deriving (Eq, Ord, Show)
You can then define ExprTree a as an instance of Num and whatnot:
instance (Num a) => Num (ExprTree a) where
(+) = Node "" Add
(*) = Node "" Mul
(-) = Node "" Sub
negate = Node "" Sub 0
fromInteger = Leaf "" . Lit
...which allows creating unlabelled expressions in a very natural way:
*Main> :t 2 + 2
2 + 2 :: (Num t) => t
*Main> 2 + 2 :: ExprTree Int
Node "" Add (Leaf "" (Lit 2)) (Leaf "" (Lit 2))
Also, note the deriving clauses above on the data definitions, particularly Ord; this tells the compiler to automatically create an ordering relation on values of that type. This lets you sort them consistently which means you can, for instance, define a canonical ordering on subexpressions so that when rearranging commutative operations you don't get stuck in a loop. Given some canonical reductions and subexpressions in canonical order, in most cases you'll then be able to use the automatic equality relation given by Eq to check for subexpression equivalence.
Note that labels will affect the ordering and equality here. If that's not desired, you'll need to write your own definitions for Eq and Ord, much like the one I gave for Num.
After that, you can write some traversal and reduction functions, to do things like apply operators, perform variable substitution, etc.

It looks like you want to construct a symbolic algebra system. There is a large and varied literature on the subject.
You don't want to represent operators as Int -> Int, because then you can't check what operation any given function implements and then implement peephole optimization for things like simplification, etc. So a simple enumerated data type would do the trick, and then write the function eval which actually evaluates your tree.

Related

The limit set of types with new data like `Tree a`

Exploring and studing type system in Haskell I've found some problems.
1) Let's consider polymorphic type as Binary Tree:
data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving Show
And, for example, I want to limit my considerations only with Tree Int, Tree Bool and Tree Char. Of course, I can make a such new type:
data TreeIWant = T1 (Tree Int) | T2 (Tree Bool) | T3 (Tree Char) deriving Show
But could it possible to make new restricted type (for homogeneous trees) in more elegant (and without new tags like T1,T2,T3) way (perhaps with some advanced type extensions)?
2) Second question is about trees with heterogeneous values. I can do them with usual Haskell, i.e. I can do the new helping type, contained tagged heterogeneous values:
data HeteroValues = H1 Int | H2 Bool | H3 Char deriving Show
and then make tree with values of this type:
type TreeH = Tree HeteroValues
But could it possible to make new type (for heterogeneous trees) in more elegant (and without new tags like H1,H2,H3) way (perhaps with some advanced type extensions)?
I know about heterogeneous list, perhaps it is the same question?
For question #2, it's easy to construct a "restricted" heterogeneous type without explicit tags using a GADT and a type class:
{-# LANGUAGE GADTs #-}
data Thing where
T :: THING a => a -> Thing
class THING a
Now, declare THING instances for the the things you want to allow:
instance THING Int
instance THING Bool
instance THING Char
and you can create Things and lists (or trees) of Things:
> t1 = T 'a' -- Char is okay
> t2 = T "hello" -- but String is not
... type error ...
> tl = [T (42 :: Int), T True, T 'x']
> tt = Branch (Leaf (T 'x')) (Leaf (T False))
>
In terms of the type names in your question, you have:
type HeteroValues = Thing
type TreeH = Tree Thing
You can use the same type class with a new GADT for question #1:
data ThingTree where
TT :: THING a => Tree a -> ThingTree
and you have:
type TreeIWant = ThingTree
and you can do:
> tt1 = TT $ Branch (Leaf 'x') (Leaf 'y')
> tt2 = TT $ Branch (Leaf 'x') (Leaf False)
... type error ...
>
That's all well and good, until you try to use any of the values you've constructed. For example, if you wanted to write a function to extract a Bool from a possibly boolish Thing:
maybeBool :: Thing -> Maybe Bool
maybeBool (T x) = ...
you'd find yourself stuck here. Without a "tag" of some kind, there's no way of determining if x is a Bool, Int, or Char.
Actually, though, you do have an implicit tag available, namely the THING type class dictionary for x. So, you can write:
maybeBool :: Thing -> Maybe Bool
maybeBool (T x) = maybeBool' x
and then implement maybeBool' in your type class:
class THING a where
maybeBool' :: a -> Maybe Bool
instance THING Int where
maybeBool' _ = Nothing
instance THING Bool where
maybeBool' = Just
instance THING Char where
maybeBool' _ = Nothing
and you're golden!
Of course, if you'd used explicit tags:
data Thing = T_Int Int | T_Bool Bool | T_Char Char
then you could skip the type class and write:
maybeBool :: Thing -> Maybe Bool
maybeBool (T_Bool x) = Just x
maybeBool _ = Nothing
In the end, it turns out that the best Haskell representation of an algebraic sum of three types is just an algebraic sum of three types:
data Thing = T_Int Int | T_Bool Bool | T_Char Char
Trying to avoid the need for explicit tags will probably lead to a lot of inelegant boilerplate elsewhere.
Update: As #DanielWagner pointed out in a comment, you can use Data.Typeable in place of this boilerplate (effectively, have GHC generate a lot of boilerplate for you), so you can write:
import Data.Typeable
data Thing where
T :: THING a => a -> Thing
class Typeable a => THING a
instance THING Int
instance THING Bool
instance THING Char
maybeBool :: Thing -> Maybe Bool
maybeBool = cast
This perhaps seems "elegant" at first, but if you try this approach in real code, I think you'll regret losing the ability to pattern match on Thing constructors at usage sites (and so having to substitute chains of casts and/or comparisons of TypeReps).

overriding default Eq definition for specific pattern

Suppose I have defined some data type that derives Eq but want to insert my own definition of (==) for some pattern. Is there any way to do this or do I have to define (==) for every pattern?
e.g.
data Asdf = One Char | Two Char Char --(deriving Eq)
instance Eq Asdf where
(==) (One _) (One _) = True
--otherwise use what the derived definition would have done
--can I do this without defining these patterns myself?
To do what you're trying to do, you have to define it yourself, and that means you have to define it for every pattern.
Basically data MyType x = A x | B x x deriving (Eq) will add a default derivation equivalent to,
instance Eq x => Eq (MyType x) where
A x1 == A x2 = x1 == x2
B x1 x2 == B x3 x4 = x1 == x3 && x2 == x4
_ == _ = False
Note that it figures out the necessary dependencies (the Eq x => part above) as well as fills in the diagonal cases -- the special cases among the n2 possible matches where the same constructor was used.
As far as I know, it does this definition all-at-once and there is no way to dig into an existing instance declaration to mess with it -- and there is a good reason for this; if they let you do this then that would mean that as codebases grow, you could not look at an instance derivation or a deriving (Eq) clause and be confident that you know exactly what it means, since some other part of the code might monkey-patch that Eq instance to do something nefarious.
So one way is to redefine the diagonal yourself. But that's not the only way. There is at least one alternative which may work if it's easier to modify several usage sites than to shove all n constructors into a single thing:
newtype EverythingIsEqual x = E x deriving (Show)
instance Eq (EverythingIsEqual x) where
_ == _ = True
data MyType x = A (EverythingIsEqual x) | B x x deriving (Show, Eq, Ord)
This newtype allows you to strategically modify certain terms to have a different Eq relation at no runtime cost -- in fact this is pretty much one of the two central arguments for newtypes; aside from the lesser one where "I want to have a type-level difference between these two Strings but they ARE just strings and I don't want to pay any performance penalty," there is the greater argument of "sometimes we want to tell Haskell to use a different Ord dictionary without messing with any of the values that this dictionary acts upon, we just want to swap out the functions."
This question discusses how to do something very similar for the Show instance, using the https://hackage.haskell.org/package/generic-deriving package: Accessing the "default show" in Haskell?
See this answer in particular: https://stackoverflow.com/a/35385768/936310
I recently used it for the Show instance recently, and it worked wonderfully. You can similarly derive Eq as well for your type, assuming it's regular enough.

Factoring out recursion in a complex AST

For a side project I am working on I currently have to deal with an abstract syntax tree and transform it according to rules (the specifics are unimportant).
The AST itself is nontrivial, meaning it has subexpressions which are restricted to some types only. (e.g. the operator A must take an argument which is of type B only, not any Expr. A drastically simplified reduced version of my datatype looks like this:
data Expr = List [Expr]
| Strange Str
| Literal Lit
data Str = A Expr
| B Expr
| C Lit
| D String
| E [Expr]
data Lit = Int Int
| String String
My goal is to factor out the explicit recursion and rely on recursion schemes instead, as demonstrated in these two excellent blog posts, which provide very powerful general-purpose tools to operate on my AST. Applying the necessary factoring, we end up with:
data ExprF a = List [a]
| Strange (StrF a)
| Literal (LitF a)
data StrF a = A a
| B a
| C (LitF a)
| D String
| E [a]
data LitF a = Int Int
| String String
If I didn't mess up, type Expr = Fix ExprF should now be isomorphic to the previously defined Expr.
However, writing cata for these cases becomes rather tedious, as I have to pattern match B a :: StrF a inside of an Str :: ExprF a for cata to be well-typed. For the entire original AST this is unfeasible.
I stumbled upon fixing GADTs, which seems to me like it is a solution to my problem, however the user-unfriendly interface of the duplicated higher-order type classes etc. is quite the unneccessary boilerplate.
So, to sum up my questions:
Is rewriting the AST as a GADT the correct way to go about this?
If yes, how could I transform the example into a well-working version? On a second note, is there better support for higher kinded Functors in GHC now?
If you've gone through the effort of to separate out the recursion in your data type, then you can just derive Functor and you're done. You don't need any fancy features to get the recursion scheme. (As a side note, there's no reason to parameterize the Lit data type.)
The fold is:
newtype Fix f = In { out :: f (Fix f) }
gfold :: (Functor f) => (f a -> a) -> Fix f -> a
gfold alg = alg . fmap (gfold alg) . out
To specify the algebra (the alg parameter), you need to do a case analysis against ExprF, but the alternative would be to have the fold have a dozen or more parameters: one for each data constructor. That wouldn't really save you much typing and would be much harder to read. If you want (and this may require rank-2 types in general), you can package all those parameters up into a record and then you could use record update to update "pre-made" records that provide "default" behavior in various circumstances. There's an old paper Dealing with Large Bananas that takes an approach like this. What I'm suggesting, to be clear, is just wrapping the gfold function above with a function that takes a record, and passes in an algebra that will do the case analysis and call the appropriate field of the record for each case.
Of course, you could use GHC Generics or the various "generic/polytypic" programming libraries like Scrap Your Boilerplate instead of this. You are basically recreating what they do.

Set-like Data Structure without `Ord`?

Given the following types:
import Data.Set as Set
-- http://json.org/
type Key = String
data Json = JObject Key (Set JValue)
| JArray JArr
deriving Show
data JObj = JObj Key JValue
deriving Show
data JArr = Arr [JValue] deriving Show
data Null = Null deriving Show
data JValue = Num Double
| S String
| B Bool
| J JObj
| Array JArr
| N Null
deriving Show
I created a JObject Key (Set Value) with a single element:
ghci> JObject "foo" (Set.singleton (B True))
JObject "foo" (fromList [B True])
But, when I tried to create a 2-element Set, I got a compile-time error:
ghci> JObject "foo" (Set.insert (Num 5.5) $ Set.singleton (B True))
<interactive>:159:16:
No instance for (Ord JValue) arising from a use of ‘insert’
In the expression: insert (Num 5.5)
In the second argument of ‘JObject’, namely
‘(insert (Num 5.5) $ singleton (B True))’
In the expression:
JObject "foo" (insert (Num 5.5) $ singleton (B True))
So I asked, "Why is it necessary for JValue to implement the Ord typeclass?"
The docs on Data.Set answer that question.
The implementation of Set is based on size balanced binary trees (or trees of bounded balance)
But, is there a Set-like, i.e. non-ordered, data structure that does not require Ord's implementation that I can use?
You will pretty much always need at least Eq to implement a set (or at least the ability to write an Eq instance, whether or not one exists). Having only Eq will give you a horrifyingly inefficient one. You can improve this with Ord or with Hashable.
One thing you might want to do here is use a trie, which will let you take advantage of the nested structure instead of constantly fighting it.
You can start by looking at generic-trie. This does not appear to offer anything for your Array pieces, so you may have to add some things.
Why Eq is not good enough
The simplest way to implement a set is using a list:
type Set a = [a]
member a [] = False
member (x:xs) | a == x = True
| otherwise = member a xs
insert a xs | member a xs = xs
| otherwise = a:xs
This is no good (unless there are very few elements), because you may have to traverse the entire list to see if something is a member.
To improve matters, we need to use some sort of tree:
data Set a = Node a (Set a) (Set a) | Tip
There are a lot of different kinds of trees we can make, but in order to use them, we must be able, at each node, to decide which of the branches to take. If we only have Eq, there is no way to choose the right one. If we have Ord (or Hashable), that gives us a way to choose.
The trie approach structures the tree based on the structure of the data. When your type is deeply nested (a list of arrays of records of lists...), either hashing or comparison can be very expensive, so the trie will probably be better.
Side note on Ord
Although I don't think you should use the Ord approach here, it very often is the right one. In some cases, your particular type may not have a natural ordering, but there is some efficient way to order its elements. In this case you can play a trick with newtype:
newtype WrappedThing = Wrap Thing
instance Ord WrappedThing where
....
newtype ThingSet = ThingSet (Set WrappedThing)
insertThing thing (ThingSet s) = ThingSet (insert (Wrap thing) s)
memberThing thing (ThingSet s) = member (WrapThing) s
...
Yet another approach, in some cases, is to define a "base type" that is an Ord instance, but only export a newtype wrapper around it; you can use the base type for all your internal functions, but the exported type is completely abstract (and not an Ord instance).

Does Haskell have return type overloading?

Based on what I've read about Haskell, and the experimentation I've done with GHC, it seems like Haskell has return type overloading (aka ad hoc polymorphism). One example of this is the fromInteger function which can give you a Double or an Integer depending on where the result is used. For example:
fd :: Double -> String
fd x = "Double"
fi :: Integer -> String
fi x = "Integer"
fd (fromInteger 5) -- returns "Double"
fi (fromInteger 5) -- returns "Integer"
A Gentle Introduction to Haskell seems to agree with this when it says:
The kind of polymorphism that we have talked about so far is commonly called parametric polymorphism. There is another kind called ad hoc polymorphism, better known as overloading. Here are some examples of ad hoc polymorphism:
The literals 1, 2, etc. are often used to represent both fixed and arbitrary precision integers.
If the numeric literals are considered to be an example of ad hoc polymorphism (aka overloading), then it seems that the same is true for the result of functions like fromInteger.
And in fact, I've found some answers to other questions on Stack Overflow that suggest that Haskell has overloading by return type.
However, at least one Haskell programmer has told me that this isn't return type overloading, and is instead an example of "parametric polymorphism, where the parameter is bound by a universal quantifier".
I think what he's getting at is that fromInteger is returning a value from every instance of Num (sort of a nondeterministic type).
That seems like a reasonable interpretation, but as far as I can tell, Haskell never lets us look at more than one of these instance values (thanks in part to the Monomorphism restriction). It also seems like the actual instance who's value we look at can be determined statically. Because of all of this, it seems reasonable to say that in the expression fd (fromInteger 5) the subexpression fromInteger 5 is of type Double, while in the expression fi (fromInteger 5) the subexpression fromInteger 5 is of type Integer.
So, does Haskell have return type overloading?
If not, please provide an example of one of the following:
valid Haskell code that would have different behavior if Haskell had return type overloading
valid Haskell code that would be invalid if Haskell had return type overloading
invalid Haskell code that would be valid if Haskell had return type overloading
Well, one way to look at it is that Haskell translates the return type polymorphism that you're thinking of into parametric polymorphism, using something called the dictionary-passing translation for type classes. (Though this is not the only way to implement type classes or reason about them; it's just the most popular.)
Basically, fromInteger has this type in Haskell:
fromInteger :: Num a => Integer -> a
That might be translated internally into something like this:
fromInteger# :: NumDictionary# a -> Integer -> a
fromInteger# NumDictionary# { fromInteger = method } x = method x
data NumDictionary# a = NumDictionary# { ...
, fromInteger :: Integer -> a
, ... }
So for each concrete type T with a Num instance, there's a NumDictionary# T value that contains a function fromInteger :: Integer -> T, and all code that uses the Num type class is translated into code that takes a dictionary as the argument.
The seminal paper on Haskell-style typeclasses is called "How to make ad-hoc polymorphism less ad hoc". So, the answer to your question is a qualified "yes" -- depending on just how ad hoc you require your return-type overloading to be...
In other words: there is no question that ad hoc polymorphism is relevant to typeclasses, since that was a motivating example for inventing them. But whether you think the result still qualifies as "return-type overloading" depends on the fiddly details of your favored definition.
I'd like to address one small part of your question:
It also seems like the actual instance who's value we look at can be determined statically.
This isn't really accurate. Consider the following wacky data type:
data PerfectlyBalancedTree a
= Leaf a
| Branch (PerfectlyBalancedTree (a,a))
deriving (Eq, Ord, Show, Read)
Let's gawk at that type for a second first before we move on to the good bits. Here are a few typical values of the type PerfectlyBalancedTree Integer:
Leaf 0
Branch (Leaf (0, 0))
Branch (Branch (Leaf ((0,0),(0,0))))
Branch (Branch (Branch (Leaf (((0,0),(0,0)),((0,0),(0,0))))))
In fact, you can visualize any value of this type as being an initial sequence of n Branch tags followed by a "we're finally done, thank goodness" Leaf tag followed by a 2^n-tuple of the contained type. Cool.
Now, we're going to write a function to parse a slightly more convenient representation for these. Here's a couple example invocations:
*Main> :t fromString
fromString :: String -> PerfectlyBalancedTree Integer
*Main> fromString "0"
Leaf 0
*Main> fromString "b(42,69)"
Branch (Leaf (42,69))
*Main> fromString "bbb(((0,0),(0,0)),((0,0),(0,0)))"
Branch (Branch (Branch (Leaf (((0,0),(0,0)),((0,0),(0,0))))))
Along the way, it will be convenient to write a slightly more polymorphic function. Here it is:
fromString' :: Read a => String -> PerfectlyBalancedTree a
fromString' ('b':rest) = Branch (fromString' rest)
fromString' leaf = Leaf (read leaf)
Now our real function is just the same thing with a different type signature:
fromString :: String -> PerfectlyBalancedTree Integer
fromString = fromString'
But wait a second... what just happened here? I slipped something by you big time! Why didn't we just write this directly?
fromStringNoGood :: String -> PerfectlyBalancedTree Integer
fromStringNoGood ('b':rest) = Branch (fromStringNoGood rest)
fromStringNoGood leaf = Leaf (read leaf)
The reason is that in the recursive call, fromStringNoGood has a different type. It's not being called on to return a PerfectlyBalancedTree Integer, it's being called on to return a PerfectlyBalancedTree (Integer, Integer). We could write ourselves such a function...
fromStringStillNoGood :: String -> PerfectlyBalancedTree Integer
fromStringStillNoGood ('b':rest) = Branch (helper rest)
fromStringStillNoGood leaf = Leaf (read leaf)
helper :: String -> PerfectlyBalancedTree (Integer, Integer)
helper ('b':rest) = Branch ({- ... what goes here, now? -})
helper leaf = Leaf (read leaf)
... but this way lies an infinite regress into writing deeperly and deeperly nested types.
The problem is that, even though we're interested in a monomorphic top-level function, we nevertheless cannot determine statically what type read is being called at in the polymorphic function it uses! The data we're passed determines what type of tuple read should return: more bs in the String means a deeper-nested tuple.
You're right: Haskell does have overloading and it provides it through its type-class mechanism.
Consider the following signatures:
f :: [a] -> a
g :: Num a => [a] -> a
The first signature tells you that given a list of elements of any type a, f will produce a value of type a. This means that the implementation of f cannot make any assumptions about the type a and what operations it admits. This is an example of parametric polymorphism. A moment's reflection reveals that there are actually very little options for implementing f: the only thing you can do is select an element from the provided list. Conceptually, there is a single generic implementation of f that works for all types a.
The second signatures tells you that given a list of elements of some type a that belongs to the type class Num, g will produce a value of that type a. This means that the implementation of g can consume, produce, and manipulate values of type a using all operations that come with the type class Num. For example, g can add or multiply the elements of the list, select the minimum of the list, return a lifted constant, ... This is an example of overloading, which is typically taken to be a form of ad-hoc polymorphism (the other main form being coercion). Conceptually, there is a different implementation for g for all types a in Num.
It has return type overloading. For a good example see the Read function. It has the type Read a => String -> a. It can read and return anything that implements the read type class.

Resources