Accessing values in haskell custom data type - haskell

I'm very new to haskell and need to use a specific data type for a problem I am working on.
data Tree a = Leaf a | Node [Tree a]
deriving (Show, Eq)
So when I make an instance of this e.g Node[Leaf 1, Leaf2, Leaf 3] how do I access these? It won't let me use head or tail or indexing with !! .

You perform pattern matching. For example if you want the first child, you can use:
firstChild :: Tree a -> Maybe (Tree a)
firstChild (Node (h:_)) = Just h
firstChild _ = Nothing
Here we wrap the answer in a Maybe type, since it is possible that we process a Leaf x or a Node [], such that there is no first child.
Or we can for instance obtain the i-th item with:
iThChild :: Int -> Tree a -> Tree a
iThChild i (Node cs) = cs !! i
So here we unwrap the Node constructor, obtain the list of children cs, and then perform cs !! i to obtain the i-th child. Note however that (!!) :: [a] -> Int -> a is usually a bit of an anti-pattern: it is unsafe, since we have no guarantees that the list contains enough elements, and using length is an anti-pattern as well, since the list can have infinite length, so we can no do such bound check.
Usually if one writes algorithms in Haskell, one tends to make use of linear access, and write total functions: functions that always return something.

Related

The simplest way to generically traverse a tree in haskell

Suppose I used language-javascript library to build AST in Haskell. The AST has nodes of different types, and each node can have fields of those different types.
And each type can have numerous constructors. (All the types instantiate Data, Eq and Show).
I would like to count each type's constructor occurrence in the tree. I could use toConstr to get the constructor, and ideally I'd make a Tree -> [Constr] function fisrt (then counting is easy).
There are different ways to do that. Obviously pattern matching is too verbose (imagine around 3 types with 9-28 constructors).
So I'd like to use a generic traversal, and I tried to find the solution in SYB library.
There is an everywhere function, which doesn't suit my needs since I don't need a Tree -> Tree transformation.
There is gmapQ, which seems suitable in terms of its type, but as it turns out it's not recursive.
The most viable option so far is everywhereM. It still does the useless transformation, but I can use a Writer to collect toConstr results. Still, this way doesn't really feel right.
Is there an alternative that will not perform a useless (for this task) transformation and still deliver the list of constructors? (The order of their appearance in the tree doesn't matter for now)
Not sure if it's the simplest, but:
> data T = L | B T T deriving Data
> everything (++) (const [] `extQ` (\x -> [toConstr (x::T)])) (B L (B (B L L) L))
[B,L,B,B,L,L,L]
Here ++ says how to combine the results from subterms.
const [] is the base case for subterms who are not of type T. For those of type T, instead, we apply \x -> [toConstr (x::T)].
If you have multiple tree types, you'll need to extend the query using
const [] `extQ` (handleType1) `extQ` (handleType2) `extQ` ...
This is needed to identify the types for which we want to take the constructors. If there are a lot of types, probably this can be made shorter in some way.
Note that the code above is not very efficient on large trees since using ++ in this way can lead to quadratic complexity. It would be better, performance wise, to return a Data.Map.Map Constr Int. (Even if we do need to define some Ord Constr for that)
universe from the Data.Generics.Uniplate.Data module can give you a list of all the sub-trees of the same type. So using Ilya's example:
data T = L | B T T deriving (Data, Show)
tree :: T
tree = B L (B (B L L) L)
λ> import Data.Generics.Uniplate.Data
λ> universe tree
[B L (B (B L L) L),L,B (B L L) L,B L L,L,L,L]
λ> fmap toConstr $ universe tree
[B,L,B,B,L,L,L]

Haskell binary tree max int?

I'm trying to write a haskell function that will return the max int inside a binary tree of integers. My binary tree is defined as follows:
data Tree = Node Int Tree Tree | Leaf Int
deriving (Eq,Show)
The way I understand it this declaration is saying that for the 'Tree' data type, it can either be a single leaf int, or be a subtree containing two more trees.
So my maxInt function will look something like this ( I think )
maxInt :: Tree -> Int --maxInt function receives Tree, returns int
maxInt --something to detect if the Tree received is empty
--if only one node, return that int
--look through all nodes, find largest
and so when my function is given something like
maxInt (Node 5 (Leaf 7) (Leaf 2)) , the correct value for maxInt to return would be 7.
I'm new to haskell and don't really know where to start with this problem, I would really appreciate some guidance. Thank you
Let me start it for you:
maxInt :: Tree -> Int
maxInt (Leaf x) = ?
maxInt (Node x l r) = ?
You may find it helpful to use the standard function max, which takes two arguments and returns their maximum:
max 3 17 = 17
To begin with, we have this datatype:
data Tree = Node Int Tree Tree | Leaf Int
deriving (Eq,Show)
That means, we have two constructors for things of type Tree: either we have a Leaf with a single Int value, or we have a Node which allows us to represent bigger trees in a recursive fashion.
So, for example we can have these trees:
Leaf 0
And more complex ones:
Node 3 (Leaf 0) (Leaf 4)
Recall that this tree representation have information both in the leaves and in the nodes, so for our function we will need to take that into account.
You guessed correctly the type of the function maxInt, so you are halfway through!
In order to define this function, given we have a custom defined datatype, we can be confident in using pattern-matching.
Pattern-matching is, putting it simple, a way to define our functions by equations described by, on the left side, one element of our datatype (either Leaf or Node, in our case) and on the right side, the result value. I'd recommend you to learn more about pattern-matching here: pattern matching in Haskell
Hence, we start our function by its type, as you correctly guessed:
maxInt :: Tree -> Int
As we have seen earlier, we will use pattern-matching for this. What would be the first equation, that is, the first pattern-matching case for our function? The simplest tree we have given our datatype is Leaf value. So we start with:
maxInt (Leaf n) = n
Why n as a result? Because we don't have any other value than n in the tree and therefore it's the maximum.
What happens in a more complex case?
maxInt (Node n leftTree rightTree) = ...
Well... we can think that the maximum value for the tree (Node n leftTree rightTree) would be the maximum among n, the maximum value of leftTree and rightTree.
Would you be encouraged to write the second equation? I strongly recommend you to first read the chapter of the book I just linked above. Also, you might want to read about recursion in Haskell.

Haskell Defining a Binary Tree

I want to define an infinite tree in Haskell using infinitree :: Tree, but want to set a pattern up for each node, defining what each node should be. The pattern is 1 more then then its parent. I am struggling on how to set up a tree to begin with, and how and where to define the pattern of each node?
Thank you
Infinite data structures can generally be defined by functions which call themselves but have no base case. Usually these functions don't need to pattern match on their arguments. For example, a list equal to [1..] can be written as
infiniteList :: [Int]
infiniteList = go 1 where
go n = n : go (n+1)
You can use the exact same technique for a tree:
data Tree a = Node (Tree a) a (Tree a) | Nil deriving (Show)
infiniteTree :: Tree Int
infiniteTree = go 1 where
go n = Node (go (2*n)) n (go (2*n+1))
This defines the infinite tree
1
/ \
2 3
/ \ / \
4 5 6 7
...
A type for infinite binary trees with no leaves:
data Tree a = Tree (Tree a) a (Tree a)
One general pattern for doing this sort of thing is called unfold. For this particular type:
unfold :: (a -> (a,b,a)) -> a -> Tree b
Can you see how to define this function and use it for your purpose?

Is there a sense of 'object equality' in Haskell?

If I have a singly linked list in Haskell:
data LL a = Empty | Node a (LL a) deriving (Show, Eq)
I can easily implement methods to insert at the end and at the beginning. But what about inserting after or before a particular element? If I have a LL of Integer, can I make a distinction in Haskell between inserting 4 after a particular node containing a 1, rather than the first 1 that it sees when processing the list?
Node 1 (Node 2 (Node 3 (Node 1 Empty)))
I'm curious how an insertAfter method would look that you would be able to specify "insert 5 after this particular node containing a 1". If I wanted to insert after the first node containing 1, would I have to pass in the entire list to specify this, and for the last node, only Node 1 Empty?
I'm not sure if it's right to address this as 'object equality'- but I'm wondering if there's a way to refer to particular elements of a type with the same payload in a data structure like this.
No, there is no such thing. The only way to tell apart values is by their structure; there is no identity like objects in some languages have. That is, there's no way you could tell apart these two values: (Just 5, Just 5) behaves exactly the same as let x = Just 5 in (x, x). Likewise, there is no difference between "this Node 1" and "some other Node 1": they are indistinguishable.
Usually the "solution" to this problem is to think of your problem in some other way so that there's no longer a need to distinguish based on identity (and usually there in fact is no need). But, as mentioned in the comments, you can emulate the "pointer" mechanic of other languages yourself, by generating distinct tags of some sort, eg increasing integers, and assigning one to each object so that you can tell them apart.
As others have pointed, in Haskell every value is immutable and there is no object.
To specify an unique node, you either need to specify it structually (the first node in the linked list that contains 1, for example) or give every node an extra tag somehow (simulating what happens in an imperative world) so that we can distinguish them.
To structurally distinguish a node from others, we basically need to know the location of
that node, e.g. a zipper that not only gives you the value at the point, but also its "neighborhoods".
And more detailed about "giving every node an extra tag":
First of all, you need to make every value an object, that requires you to generate unique tags at runtime. This is usually done by an allocator, the simplest allocator might just keep an integer, bump it when we need to create a new object:
-- | bumps counter
genId :: (Monad m, Functor m, Enum e) => StateT e m e
genId = get <* modify succ
-- | given a value, initializes a new node value
newNode :: (Monad m, Functor m, Enum e) => a -> StateT e m (a,e)
newNode x = genId >>= return . (x,)
And if you want to make an existing linked list work, we need to walk through it and give every node value a tag to make it an object:
-- | tags the llnked list with an extra value
tagged :: (Traversable f, Enum e, Monad m, Functor m)
=> f a -> StateT e m (f (a,e))
tagged = traverse newNode
And here is the full demo, it does look Maybe "a little" awkward:
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable, TupleSections #-}
import Control.Applicative
import Control.Monad.State hiding (mapM_)
import Data.Traversable
import Data.Foldable
import Prelude hiding (mapM_)
data LL a = Empty | Node a (LL a)
deriving (Show, Eq, Functor, Foldable, Traversable)
-- | bumps counter
genId :: (Monad m, Functor m, Enum e) => StateT e m e
genId = get <* modify succ
-- | given a value, initializes a new node value
newNode :: (Monad m, Functor m, Enum e) => a -> StateT e m (a,e)
newNode x = genId >>= return . (x,)
example :: LL Int
example = Node 1 (Node 2 (Node 3 (Node 1 Empty)))
-- | tags the llnked list with an extra value
tagged :: (Traversable f, Enum e, Monad m, Functor m)
=> f a -> StateT e m (f (a,e))
tagged = traverse newNode
insertAfter :: (a -> Bool) -> a -> LL a -> LL a
insertAfter cond e ll = case ll of
Empty -> Empty
Node v vs -> Node v (if cond v
then Node e vs
else insertAfter cond e vs)
demo :: StateT Int IO ()
demo = do
-- ll1 = Node (1,0) (Node (2,1) (Node (3,2) (Node (1,3) Empty)))
ll1 <- tagged example
nd <- newNode 10
let tagIs t = (== t) . snd
ll2 = insertAfter (tagIs 0) nd ll1
-- ll2 = Node (1,0) (Node (10,4) (Node (2,1) (Node (3,2) (Node (1,3) Empty))))
ll3 = insertAfter (tagIs 3) nd ll1
-- ll3 = Node (1,0) (Node (2,1) (Node (3,2) (Node (1,3) (Node (10,4) Empty))))
liftIO $ mapM_ print [ll1,ll2,ll3]
main :: IO ()
main = evalStateT demo (0 :: Int)
In this demo, tagIs is essentially doing the "object equality" thing because it is only interested in the extra tag we added before. Notice here I cheated in order to specify two nodes with their "values" being 1: one tagged 0 and the other tagged 3. Before running the program, it's impossible to tell what the actually tag would be. (Just like hard-coding a pointer value and hope it happens to work) In a more realistic setting, you would need another function to scan through the linked list and collect you a list of tags with a certain value (in this example, if you search the linked list to find all the nodes with "value" 1, you would have [0,3]) to work with.
"object equality" seems more like a concept from imperative programming languages, which assumes that there are allocators to offer "references" or "pointers" so that we can talk about "object equality". We have to simulate that allocator, I guess this is the thing that makes functional programming a little awkward to deal with it.
Kristopher Micinski remarked that you actually can do something similar with the ST monad, and you can do it with IO as well. Specifically, you can create an STRef or IORef, which is a sort of mutable box. The box can only be accessed using IO or ST actions as appropriate, which maintains the clean separation between "pure" and "impure" code. These references have identity—asking if two are equal tells you if they are actually the same box, rather than whether they have the same contents. But this is not really so pleasant, and not something you're likely to do without a good reason.
No, because it would break referential transparency. The results from calling a method with the same input multiple times should be indistinguishable, and it should be possible to replace it transparently with calling the method with that input once and then re-using the result. However, calling a method that returns some structure multiple times may produce a new copy of the structure every time -- structures with different "identity". If you could somehow tell that they have different identities, then it violates referential transparency.

Binding together data, types and functions

I want to model a large tree (or forest) of some regular structure - tree can be decomposed to small tree (the irregular part) and (i.e.) large list of params, each of them with each of nodes make a node of big tree.
So, I want a data structure, where each node in a tree is representing many nodes. And real node is of type (node,param).
For algorithms that work on this kind of trees type of that param does not mattter. They are just placeholders. But some data should be possible to extract from the plain param or combination of node and param, and all possible params should be iterable. All that kinds of data is known apriori, they reflect semantic of that tree.
So, actual type, semantics and stuff of param is up to implementation of tree.
I model it in C++ using nested typedefs for params type, fixed method names for all kind of stuff that should be available to algorithm (this two together making a concept) and templates for algorithm itself.
I.e. if I want to associate with each node of big tree an integer, I would provide a function int data(const node& n, const param& p), where param is available as nested typedef, and algorithm could get list of all available params, and call data with nodes of interest and each of params
I have some plain data type, i.e. tree data, like this
data Tree = Node [Tree] | Leaf
Now I want to package up:
concrete tree
some type
some values of that type
some functions operating on (that concrete) tree nodes and (that) values
So one can write some function that use this packaged up types and functions, like, generic way.
How to achieve that?
With type families I came to
class PackagedUp t where
type Value t
tree :: Tree t
values :: [Value t]
f :: Tree t -> Value t -> Int
Tree now become Tree t because type families want type of their members to depend on typeclass argument.
Also, as in https://stackoverflow.com/a/16927632/1227578 type families to deal with injectivity will be needed.
With this I can
instance PackagedUp MyTree where
type Value MyTree = (Int,Int)
tree = Leaf
values = [(0,0),(1,1)]
f t v = fst v
And how to write such a function now? I.e. a function that will take root of a tree, all of values and make a [Int] of all f tree value.
First of all, your tree type should be defined like this:
data Tree a = Node a [Tree a] | Leaf
The type above is polymorphic. As far as semantics go that resembles what we would call a generic type in OO parlance (in C# or Java we might write Tree<A> instead). A node of a Tree a holds a value of type a and a list of subtrees.
Next, we come to PackagedUp. Classes in Haskell have little to do with the OO concept of the same name; they are not meant to package data and behaviour together. Things are actually much simpler: all you need to do is defining the appropriate functions for your tree type
getRoot :: Tree a -> Maybe a
getRoot Leaf = Nothing
getRoot (Node x _) = Just x
(Returning Maybe a is a simple way to handle failure with type safety. Think of the Nothing value as a polite cousin of null that doesn't explode with null reference exceptions.)
One thing that type classes are good at is in expressing data structure algorithm interfaces such as the ones you allude to. One of the most common classes is Functor, which provides a general interface for mapping over data structures.
instance Functor Tree where
fmap f Leaf = Leaf
fmap f (Node x ts) = Node (f x) (fmap f ts)
fmap has the following polymorphic type:
fmap :: Functor f => (a -> b) -> f a -> f b
With your tree, it specialises to
fmap :: (a -> b) -> Tree a -> Tree b
and with lists (as in fmap f ts) it becomes
fmap :: (a -> b) -> [a] -> [b]
Finally, the Data.Tree module provides a data structure which looks a lot like what you want to define.

Resources