Function to measure size of a binary tree - haskell

When i tried to size (N a (left a) (right a)) instead of size (N a left right), i was told by ghci that this line conflicts when the definition. I am not sure why because in my data signature, it is N a (Tree a) (Tree a). size is a function to count the number of nodes in a bin tree.
data Tree a = Nil | N a (Tree a) (Tree a) deriving (Show, Read, Eq)
size :: Tree Int -> Int
size Nil = 0
size (N _ left right) = 1 + size left + size right

When i tried to size (N a (left a) (right a)) instead of size (N a left right)
left and right in this case are expressions of type Tree Int.
a is not a known variable or type in this context.
In case the definition is updated to size (N a left right), then a is a bound expression of type Int.

To help you see what’s going on, you could write your match internal nodes to name the left and right subtrees and their respective values with
size (N _ left#(N a _ _) right#(N b _ _)) = 1 + size left + size right
Section 3.17.1 “Patterns” describes what is happening with the at signs, which allow the programmer to name the left and right subtrees.
Patterns of the form var#pat are called as-patterns, and allow one to use var as a name for the value being matched by pat.
The broad approach is inelegant for a number of reasons.
left and right are already constrained to be of type Tree because of the declaration of the Tree algebraic datatype.
Much worse, you’d also have to define the other three cases of size for either one or two Nil arguments.
Section 3.17.2 Informal Semantics of Pattern Matching outlines the cases for how the language handles patterns. Of especial note to you in the context of this question are
1. Matching the pattern var against a value v always succeeds and binds var to v.
and
5. Matching the pattern con pat1 … patn against a value, where con is a constructor defined by data, depends on the value:
If the value is of the form con v1 … vn, sub-patterns are matched left-to-right against the components of the data value; if all matches succeed, the overall match succeeds; the first to fail or diverge causes the overall match to fail or diverge, respectively.
If the value is of the form con′ v1 … vn, where con is a different constructor to con′, the match fails.
If the value is ⊥, the match diverges.
The first is how you want to do it and how you wrote it in your question, by binding the left and right subtree to variables. Your first attempt looked vaguely like binding to a constructor, and that’s why you got a syntax error.
Haskell pattern matching can be more sophisticated, e.g., view patterns. For learning exercises, master the basics first.

Related

Is there a way to avoid copying the whole search path of a binary tree on insert?

I've just started working my way through Okasaki's Purely Functional Data Structures, but have been doing things in Haskell rather than Standard ML. However, I've come across an early exercise (2.5) that's left me a bit stumped on how to do things in Haskell:
Inserting an existing element into a binary search tree copies the entire search path
even though the copied nodes are indistinguishable from the originals. Rewrite insert using exceptions to avoid this copying. Establish only one handler per insertion rather than one handler per iteration.
Now, my understanding is that ML, being an impure language, gets by with a conventional approach to exception handling not so different to, say, Java's, so you can accomplish it something like this:
type Tree = E | T of Tree * int * Tree
exception ElementPresent
fun insert (x, t) =
let fun go E = T (E, x, E)
fun go T(l, y, r) =
if x < y then T(go (l), x, r)
else if y < x then T(l, x, go (r))
else raise ElementPresent
in go t
end
handle ElementPresent => t
I don't have an ML implementation, so this may not be quite right in terms of the syntax.
My issue is that I have no idea how this can be done in Haskell, outside of doing everything in the IO monad, which seems like cheating and even if it's not cheating, would seriously limit the usefulness of a function which really doesn't do any mutation. I could use the Maybe monad:
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x t = maybe t id (go t)
where go Empty = return (Fork Empty x Empty)
go (Fork l y r)
| x < y = do l' <- go l; return (Fork l' y r)
| x > y = do r' <- go r; return (Fork l y r')
| otherwise = Nothing
This means everything winds up wrapped in Just on the way back up when the element isn't found, which requires more heap allocation, and sort of defeats the purpose. Is this allocation just the price of purity?
EDIT to add: A lot of why I'm wondering about the suitability of the Maybe solution is that the optimization described only seems to save you all the constructor calls you would need in the case where the element already exists, which means heap allocations proportional to the length of the search path. The Maybe also avoids those constructor calls when the element already exists, but then you get a number of Just constructor calls equal to the length of the search path. I understand that a sufficiently smart compiler could elide all the Just allocations, but I don't know if, say, the current version of GHC is really that smart.
In terms of cost, the ML version is actually very similar to your Haskell version.
Every recursive call in the ML version results in a stack frame. The same is true in the
Haskell version. This is going to be proportional in size to the path that you traverse in
the tree. Also, both versions will of course allocate new nodes for the entire path if an insertion is actually performed.
In your Haskell version, every recursive call might also eventually result in the
allocation of a Just node. This will go on the minor heap, which is just a block of
memory with a bump pointer. For all practical purposes, GHC's minor heap is roughly equivalent in
cost to the stack. Since these are short-lived allocations, they won't normally end up
being moved to the major heap at all.
GHC generally cannot elide path copying in cases like that. However, there is a way to do it manually, without incurring any of the indirection/allocation costs of Maybe. Here it is:
{-# LANGUAGE MagicHash #-}
import GHC.Prim (reallyUnsafePtrEquality#)
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x Empty = Fork Empty x Empty
insert x node#(Fork l y r)
| x < y = let l' = insert x l in
case reallyUnsafePtrEquality# l l' of
1# -> node
_ -> Fork l' y r
| x > y = let r' = insert x r in
case reallyUnsafePtrEquality# r r' of
1# -> node
_ -> Fork l y r'
| otherwise = node
The pointer equality function does exactly what's in the name. Here it is safe because even if the equality returns a false negative we only do a bit of extra copying, and nothing worse happens.
It's not the most idiomatic or prettiest Haskell, but the performance benefits can be significant. In fact, this trick is used very frequently in unordered-containers.
As fizruk indicates, the Maybe approach is not significantly different from what you'd get in Standard ML. Yes, the whole path is copied, but the new copy is discarded if it turns out not to be needed. The Just constructor itself may not even be allocated on the heap—it can't escape from insert, let alone the module, and you don't do anything weird with it, so the compiler is free to analyze it to death.
Edit
There are efficiency problems, now that I think of it. Your use of Maybe conceals the fact that you're actually making two passes—one down to find the insertion point and one up to build the tree. The solution to this is to drop Maybe Tree in favor of (Tree,Bool) and use strictness annotations, or to switch to continuation-passing style. Also, if you choose to stay with the three-way logic, you may want to use the three-way comparison function. Alternatively, you can go all the way to the bottom each time and check later if you hit a duplicate.
If you have a predicate that checks whether the key is already in the tree, you can look before you leap:
insert x t = if contains t x then t else insert' x t
This traverses the tree twice, of course. Whether that's as bad as it sounds should be determined empirically: it might just load the relevant part of the tree into the cache.

How can quotient types help safely expose module internals?

Reading up on quotient types and their use in functional programming, I came across this post. The author mentions Data.Set as an example of a module which provides a ton of functions which need access to module's internals:
Data.Set has 36 functions, when all that are really needed to ensure the meaning of a set ("These elements are distinct") are toList and fromList.
The author's point seems to be that we need to "open up the module and break the abstraction" if we forgot some function which can be implemented efficiently only using module's internals.
He then says
We could alleviate all of this mess with quotient types.
but gives no explanation to that claim.
So my question is: how are quotient types helping here?
EDIT
I've done a bit more research and found a paper "Constructing Polymorphic Programs with Quotient Types". It elaborates on declaring quotient containers and mentions the word "efficient" in abstract and introduction. But if I haven't misread, it does not give any example of an efficient representation "hiding behind" a quotient container.
EDIT 2
A bit more is revealed in "[PDF] Programming in Homotopy Type Theory" paper in Chapter 3. The fact that quotient type can be implemented as a dependent sum is used. Views on abstract types are introduced (which look very similar to type classes to me) and some relevant Agda code is provided. Yet the chapter focuses on reasoning about abstract types, so I'm not sure how this relates to my question.
I recently made a blog post about quotient types, and I was led here by a comment. The blog post may provide some additional context in addition to the papers referenced in the question.
The answer is actually pretty straightforward. One way to arrive at it is to ask the question: why are we using an abstract data type in the first place for Data.Set?
There are two distinct and separable reasons. The first reason is to hide the internal type behind an interface so that we can substitute a completely new type in the future. The second reason is to enforce implicit invariants on values of the internal type. Quotient type and their dual subset types allow us to make the invariants explicit and enforced by the type checker so that we no longer need to hide the representation. So let me be very clear: quotient (and subset) types do not provide you with any implementation hiding. If you implement Data.Set with quotient types using lists as your representation, then later decide you want to use trees, you will need to change all code that uses your type.
Let's start with a simpler example (leftaroundabout's). Haskell has an Integer type but not a Natural type. A simple way to specify Natural as a subset type using made up syntax would be:
type Natural = { n :: Integer | n >= 0 }
We could implement this as an abstract type using a smart constructor that threw an error when given a negative Integer. This type says that only a subset of the values of type Integer are valid. Another approach we could use to implement this type is to use a quotient type:
type Natural = Integer / ~ where n ~ m = abs n == abs m
Any function h :: X -> T for some type T induces a quotient type on X quotiented by the equivalence relation x ~ y = h x == h y. Quotient types of this form are more easily encoded as abstract data types. In general, though, there may not be such a convenient function, e.g.:
type Pair a = (a, a) / ~ where (a, b) ~ (x, y) = a == x && b == y || a == y && b == x
(As to how quotient types relate to setoids, a quotient type is a setoid that enforces that you respect its equivalence relation.) This second definition of Natural has the property that there are two values that represent 2, say. Namely, 2 and -2. The quotient type aspect says we are allowed to do whatever we want with the underlying Integer, so long as we never produce a result that differentiates between these two representatives. Another way to see this is that we can encode a quotient type using subset types as:
X/~ = forall a. { f :: X -> a | forEvery (\(x, y) -> x ~ y ==> f x == f y) } -> a
Unfortunately, that forEvery is tantamount to checking equality of functions.
Zooming back out, subset types add constraints on producers of values and quotient types add constraints on consumers of values. Invariants enforced by an abstract data type may be a mixture of these. Indeed, we may decide to represent a Set as the following:
data Tree a = Empty | Branch (Tree a) a (Tree a)
type BST a = { t :: Tree a | isSorted (toList t) }
type Set a = { t :: BST a | noDuplicates (toList t) } / ~
where s ~ t = toList s == toList t
Note, nothing about this ever requires us to actually execute isSorted, noDuplicates, or toList. We "merely" need to convince the type checker that the implementations of functions on this type would satisfy these predicates. The quotient type allows us to have a redundant representation while enforcing that we treat equivalent representations in the same way. This doesn't mean we can't leverage the specific representation we have to produce a value, it just means that we must convince the type checker that we would have produced the same value given a different, equivalent representation. For example:
maximum :: Set a -> a
maximum s = exposing s as t in go t
where go Empty = error "maximum of empty Set"
go (Branch _ x Empty) = x
go (Branch _ _ r) = go r
The proof obligation for this is that the right-most element of any binary search tree with the same elements is the same. Formally, it's go t == go t' whenever toList t == toList t'. If we used a representation that guaranteed the tree would be balanced, e.g. an AVL tree, this operation would be O(log N) while converting to a list and picking the maximum from the list would be O(N). Even with this representation, this code is strictly more efficient than converting to a list and getting the maximum from the list. Note, that we could not implement a function that displayed the tree structure of the Set. Such a function would be ill-typed.
I'll give a simpler example where it's reasonably clear. Admittedly I myself don't really see how this would translate to something like Set, efficiently.
data Nat = Nat (Integer / abs)
To use this safely, we must be sure that any function Nat -> T (with some non-quotient T, for simplicity's sake) does not depend on the actual integer value, but only on its absolute. To do so, it's not really necessary to hide Integer completely; it would be sufficient to prevent you from matching on it directly. Instead, the compiler might rewrite the matches, e.g.
even' :: Nat -> Bool
even' (Nat 0) = True
even' (Nat 1) = False
even' (Nat n) = even' . Nat $ n - 2
could be rewritten to
even' (Nat n') = case abs n' of
[|abs 0|] -> True
[|abs 1|] -> False
n -> even' . Nat $ n - 2
Such a rewriting would point out equivalence violations, e.g.
bad (Nat 1) = "foo"
bad (Nat (-1)) = "bar"
bad _ = undefined
would rewrite to
bad (Nat n') = case n' of
1 -> "foo"
1 -> "bar"
_ -> undefined
which is obviously an overlapped pattern.
Disclaimer: I just read up on quotient types upon reading this question.
I think the author's just saying that sets can be described as quotient types over lists. Ie: (making up some haskell-like syntax):
data Set a = Set [a] / (sort . nub) deriving (Eq)
Ie, a Set a is just a [a] with equality between two Set a's determined by whether the sort . nub of the underlying lists are equal.
We could do this explicitly like this, I guess:
import Data.List
data Set a = Set [a] deriving (Show)
instance (Ord a, Eq a) => Eq (Set a) where
(Set xs) == (Set ys) = (sort $ nub xs) == (sort $ nub ys)
Not sure if this is actually what the author intended as this isn't a particularly efficient way of implementing a set. Someone can feel free to correct me.

Tree Fold operation?

I am taking a class in Haskell, and we need to define the fold operation for a tree defined by:
data Tree a = Lf a | Br (Tree a) (Tree a)
I can not seem to find any information on the "tfold" operation or really what it supposed to do. Any help would be greatly appreciated.
I always think of folds as a way of systematically replacing constructors by other functions. So, for instance, if you have a do-it-yourself List type (defined as data List a = Nil | Cons a (List a)), the corresponding fold can be written as:
listfold nil cons Nil = nil
listfold nil cons (Cons a b) = cons a (listfold nil cons b)
or, maybe more concisely, as:
listfold nil cons = go where
go Nil = nil
go (Cons a b) = cons a (go b)
The type of listfold is b -> (a -> b -> b) -> List a -> b. That is to say, it takes two 'replacement constructors'; one telling how a Nil value should be transformed into a b, another replacement constructor for the Cons constructor, telling how the first value of the Cons constructor (of type a) should be combined with a value of type b (why b? because the fold has already been applied recursively!) to yield a new b, and finally a List a to apply the whole she-bang to - with a result of b.
In your case, the type of tfold should be (a -> b) -> (b -> b -> b) -> Tree a -> b by analogous reasoning; hopefully you'll be able to take it from there!
Imagine you define that a tree should be shown in the following manner,
<1 # <<2#3> # <4#5>>>
Folding such a tree means replacing each branch node with an actual supplied operation to be performed on the results of fold recursively performed on the data type's constituents (here, the node's two child nodes, which are themselves, each, a tree), for example with +, producing
(1 + ((2+3) + (4+5)))
So, for leaves you should just take the values inside them, and for branches, recursively apply the fold for each of the two child nodes, and combine the two results with the supplied function, the one with which the tree is folded. (edit:) When "taking" values from leaves, you could additionally transform them, applying a unary function. So in general, your folding will need two user-provided functions, one for leaves, Lf, and another one for combining the results of recursively folding the tree-like constituents (i.e. branches) of the branching nodes, Br.
Your tree data type could have been defined differently, e.g. with possibly empty leaves, and with internal nodes also carrying the values. Then you'd have to provide a default value to be used instead of the empty leaf nodes, and a three-way combination operation. Still you'd have the fold defined by two functions corresponding to the two cases of the data type definition.
Another distinction to realize here is, what you fold, and how you fold it. I.e. you could fold your tree in a linear fashion, (1+(2+(3+(4+5)))) == ((1+) . (2+) . (3+) . (4+) . (5+)) 0, or you could fold a linear list in a tree-like fashion, ((1+2)+((3+4)+5)) == (((1+2)+(3+4))+5). It is all about how you parenthesize the resulting "expression". Of course in the classic take on folding the expression's structure follows that of the data structure being folded; but variations do exist. Note also, that the combining operation might not be strict, and the "result" type it consumes/produces might express compound (lists and such), as well as atomic (numbers and such), values.
(update 2019-01-26) This re-parenthesization is possible if the combining operation is associative, like +: (a1+a2)+a3 == a1+(a2+a3). A data type together with such associative operation and a "zero" element (a+0 == 0+a == a) is known as "Monoid", and the notion of folding "into" a Monoid is captured by the Foldable type class.
A fold on a list is a reduction from a list into a single element. It takes a function and then applies that function to elements, two at a time, until it has only one element. For example:
Prelude> foldl1 (+) [3,5,6,7]
21
...is found by doing operations one-by-one:
3 + 5 == 8
8 + 6 == 14
14 + 7 == 21
A fold can be written
ourFold :: (a -> a -> a) -> [a] -> a
ourFold _ [a] = a -- pattern-match for a single-element list. Our work is done.
ourFold aFunction (x0:x1:xs) = ourFold aFunction ((aFunction x0 x1):xs)
A tree fold would do this, but move up or down the branches of the tree. To do this, it first need to pattern-match to see whether you're operating on a Leaf or a Branch.
treeFold _ (Lf a) = Lf a -- You can't do much to a one-leaf tree
treeFold f (Br a b) = -- ...
The rest is left up to you, since it's homework. If you're stuck, try first thinking of what the type should be.
A fold is an operation which "compacts" a data structure into a single value using an operation. There are variations depending if you have a start value and execution order (e.g. for lists you have foldl, foldr, foldl1 and foldr1), so the correct implementation depends on your assignment.
I guess your tfold should simply replace all leafs with its values, and all branches with applications of the given operation. Draw an example tree with some numbers, an "collapse" him given an operation like (+). After this, it should be easy to write a function doing the same.

Haskell add value to tree

Im trying to make a funciton which allows me to add a new value to a tree IF the value at the given path is equal to ND (no data), this was my first attempt.
It checks the value etc, but the problem, is i want to be able to print the modified tree with the new data. can any one give me any pointers? I have also tried making a second function that checks the path to see if its ok to add data, but im just lost to how to print out the modified tree?
As iuliux points out, your problem is that you are treating your BTree as though it were a mutable structure. Remember functions in haskell take arguments and return a value. That is all. So when you "map over" a list, or traverse a tree your function needs to return a new tree.
The code you have is traversing the recursive tree and only returning the last leaf. Imagine for now that the leaf at the end of the path will always be ND. This is what you want:
add :: a -> Path -> Btree a -> Btree a
add da xs ND = Data da
add _ [] _ = error "You should make sure this doesn't happen or handle it"
add da (x:xs) (Branch st st2) =
case x of
L -> Branch (add da xs st) st2
R -> Branch st (add da xs st2)
Notice how in your original code you discard the Branch you pattern match against, when what you need to do is return it "behind you" as it were.
Now, on to the issue of handling situations where the leaf you arrive it is not a ND constructor:
This type of problem is common in functional programming. How can you return your recursive data structure "as you go" when the final result depends on a leaf far down the tree?
One solution for the trickiest of cases is the Zipper, which is a data structure that lets you go up down and sideways as you please. For your case that would be overkill.
I would suggest you change your function to the following:
add :: a -> Path -> Btree a -> Maybe (Btree a)
which means at each level you must return a Maybe (Btree a). Then use the Functor instance of Maybe in your recursive calls. Notice:
fmap (+1) (Just 2) == Just 3
fmap (+1) (Nothing) == Nothing
You should try to puzzle out the implementation for yourself!
I'm no expert in Haskell, but functional programming only works with functions. So kind of anything is a function.
Now, your function takes some input and returns something, not modifing the input. You have to retain the returned tree somewhere and that will be your new tree, the one with inserted element in it
We really need to see the Path and Error data types to answer your question, but you can print out your trees using the IO Monad:
main :: IO()
main = do let b = Branch ND (Branch (Data 1) (Data 2))
let b1 = add 10 [L] b --actual call depends on definition of Path
(putStrLn . show) b1

Beginning Haskell - getting "not in scope: data constructor" error

I'm going through the problems in the Haskell O'Reilly book. The problem I am working on is
Using the binary tree type that we defined earlier in this chapter,
write a function that will determine the height of the tree. The height
is the largest number of hops from the root to an Empty. For example, the
tree Empty has height zero; Node "x" Empty Empty has height one;
Node "x" Empty (Node "y" Empty Empty) has height two; and so on.
I'm writing my code in a file called ch3.hs. Here's my code:
36 data Tree a = Node a (Tree a) (Tree a)
37 | Empty
38 deriving (Show)
39
40 --problem 9:Determine the height of a tree
41 height :: Tree -> Int
42 height (Tree node left right) = if (left == Empty && right == Empty) then 0 else max (height left) (height right)
opening ghci in the terminal and typing :load ch3.hs. When I do that I get the following error:
Prelude> :load ch3.hs
[1 of 1] Compiling Main ( ch3.hs, interpreted )
ch3.hs:42:7: Not in scope: data constructor `Tree'
Failed, modules loaded: none.
I expect that the Tree data constructor should be there, because I defined it in the lines above the height method. But when I try to load the file, I'm told that the data constructor is not in scope. I appreciate your help and explanation of why this error occurs. Thanks,
Kevin
Change
height (Tree node left right)
to
height (Node node left right)
That means the pattern matching works on the constructors of the algebraic data type (ADT). Tree is not a constructor, it is the name of the ADT.
Btw, you have to comment out your function signature declaration to compile the code because it contains an error.
You can then check the inferred type via
:t height
in ghci or hugs.
Your code is wrong, on several levels. It looks like you misunderstood algebraic data types.
The type signature is wrong, a Tree is always a Tree of a specific type - which you called a in its declaration, and which may be any type (since you didn't constraint it). So heigth has to take a Tree of some type - a Tree SomeType, too. You can and should use the most generic type for SomeType, i.e. a type variable like a.
When pattern matching, you name a specific constructor - Node a (Tree a) (Tree a) or Empty - to match against, not against the type as a whole. So height (Node ...) would match a Node, height (Empty) would match a Empty, and height (Tree ...) would try to match a constructor named Tree, but there is none. That's the error message you recieve.
You never ever compare (via ==) with a Constructor. It would actually work if you wrote deriving (Show, Eq). But you should use pattern matching to determine if you reached an Empty
Which leads to: You're only matching Node, not Empty - you should add a clause for Empty.
Also, your function still returns 0 for all inputs if you fix all the above issues. You never return anything but 0 or the max of the childrens' height - which can, in turn, only return 0 or the max of their childrens' height, etc ad infinitum. You have to increment the result at each level ;)
You pattern-match against constructors, i.e. the cases, of your Tree ADT. Tree is just what sums them all up.
It's much more straightforward like this, and, most of all, correct:
height Empty = 0
height (Node _ l r) = 1 + max (height l) (height r)

Resources