Simple question: can I use the API under Data.Vector.Fusion.Bundle? Is it considered public and stable? And if yes: is there more information available how to use it?
An example would be writing gathering operations in trees where data is stored in vectors in the leaves:
data Tree a = Empty | Node a (Tree a) (Tree a) | Leaf a
toBundle Empty = B.empty
toBundle (Leaf x) = B.singleton x
toBundle (Node x l r) = toBundle l B.++ B.singleton x B.++ toBundle r
toVector t = V.unfoldrN s (\b -> Just (B.head b, B.tail b)) b
where b = toBundle t
(Just s) = B.upperBound (B.size b)
(There are no vectors in the leaves ... just imagine them :) )
As I wrote this I found out there is no Bundle v a -> v a function converting the bundle back to a vector ... am I missing something?
I think what you're looking for is the function: unstream :: Vector v a => Bundle v a -> v a as implemented in the data.vector.generic.(new) module.
Refering to your toVector code example, I would rather suggest to avoid relying on the size hint given for the manually constructed stream/bundle and instead use a (possibly) more efficient method for the construction of the vector, e.g. by using the function: generateM :: Monad m => Int -> (Int -> m a) -> m (Vector a).
In addition, from a pure conceptual view, I'm wondering whether Leaf a as it stands is basically equivalent to Node a Empty Empty and therefore possibly redundant? How's this: type Leaf a = Node a Empty Empty?
Related
I am trying to use an unfold function to build trees.
Tree t = Leaf | Node (Tree t) t (Tree t)
unfoldT :: (b -> Maybe (b, a, b)) -> b -> Tree a
unfoldT f b =
case f b of
Nothing -> Leaf
Just (lt, x, rt) -> Node (unfoldT f lt) x (unfoldT f rt)
The build function needs to create a tree that has a height equal to the number provided, as well as be numbered in an in-order fashion. The base case being build 0 = Leaf and the next being build 1 = Node (Leaf 0 Leaf).
build :: Integer -> Tree Integer
My attempt at solving it:
build n = unfoldT (\x -> Just x) [0..2^n-2]
I am not entirely sure how to go about constructing the tree here.
Would love it if somebody could point me in the right direction.
Edit 1:
If I was to use a 2-tuple, what would I combine? I need to be able to refer to the current node, its left subtree and its right subtree somehow right?
If I was to use a 2-tuple, what would I combine?
I would recommend to pass the remaining depth as well as the offset from the left:
build = unfoldT level . (0,)
where
level (_, 0) = Nothing
level (o, n) = let mid = 2^(n-1)
in ((o, n-1), o+mid-1, (o+mid, n-1))
If I was to use a 2-tuple, what would I combine?
That's the key question behind the state-passing paradigm in functional programming, expressed also with the State Monad. We won't be dealing with the latter here, but maybe use the former.
But before that, do we really need to generate all the numbers in a list, and then work off that list? Don't we know in advance what are the numbers we'll be working with?
Of course we do, because the tree we're building is totally balanced and fully populated.
So if we have a function like
-- build2 (depth, startNum)
build2 :: (Int, Int) -> Tree Int
we can use it just the same to construct both halves of e.g. the build [0..14] tree:
build [0..14] == build2 (4,0) == Node (build2 (3,0)) 7 (build2 (3,8))
Right?
But if we didn't want to mess with the direct calculations of all the numbers involved, we could arrange for the aforementioned state-passing, with the twist to build2's interface:
-- depth, startNum tree, nextNum
build3 :: (Int, Int) -> (Tree Int, Int)
and use it like
build :: Int -> Tree Int -- correct!
build depth = build3 (depth, 0) -- intentionally incorrect
build3 :: (Int, Int) -> (Tree Int, Int) -- correct!
build3 (depth, start) = Node lt n rt -- intentionally incorrect
where
(lt, n) = build3 (depth-1, start) -- n is returned
(rt, m) = build3 (depth-1, n+1) -- and used, next
You will need to tweak the above to make all the pieces fit together (follow the types!), implementing the missing pieces of course and taking care of the corner / base cases.
Formulating this as an unfold would be the next step.
Some common performance advice in Haskell is to make fast data structures "spine strict" so that the structure, but not necessarily its contents, is fully evaluated as it is created. This lets us do more work when we insert a value and the structure is in cache as opposed to putting it off until we look a value up.
With a normal data type, like the binary trie from Data.IntMap, this can be accomplished by making the relevant fields in the data structure strict:
data IntMap a = Bin {- ... -} !(IntMap a) !(IntMap a)
| {- ... -}
(Excerpt from the Data.IntMap.Base source.)
How can I achieve the same behavior if I want to store the children in a vector rather than directly as fields of Bin?
data IntMap a = Bin {- ... -} (Vector (IntMap a))
| {- ... -}
First, I'll answer a simple variant of the question:
If your data type is unboxable, e.g. you want a strict vector of Ints,
use Data.Vector.Unboxed.
As a free bonus, the implementation allows you to have "structure of arrays", (Vector a, Vector b), even the interface
is less error-prone "array of structures", Vector (a, b).
See Wikipedia on AOS and SOA.
Yet, in the OPs question, we want to stick IntMap a into Vector, and
IntMap isn't unboxable (or storable or primitive).
The various options boil down to the same idea: you have to seq values yourself.
Whether you go for
Data.Primitive.Array
or implementing own Data.Vector.Strict on top of Data.Vector (note: basicClear can be no-op as
it is for unboxed vectors, or you can use unsafeCoerce () as a dummy value),
you will seq values. This is how
Data.Map.Strict is implemented on top
of the same lazy structure as Data.Map.Lazy.
For example
map
Data.Map.Strict is implemented as:
map :: (a -> b) -> Map k a -> Map k b
map f = go
where
go Tip = Tip
go (Bin sx kx x l r) = let !x' = f x in Bin sx kx x' (go l) (go r)
Compare that to Data.Map.Lazy.map:
map :: (a -> b) -> Map k a -> Map k b
map f = go where
go Tip = Tip
go (Bin sx kx x l r) = Bin sx kx (f x) (go l) (go r)
I've been thinking in how to implement the equivalent of unfold for the following type:
data Tree a = Node (Tree a) (Tree a) | Leaf a | Nil
It was not immediately obvious since the standard unfold for lists returns a value and the next seed. For this datatype, it doesn't make sense, since there is no "value" until you reach a leaf node. This way, it only really makes sense to return new seeds or stop with a value. I'm using this definition:
data Drive s a = Stop | Unit a | Branch s s deriving Show
unfold :: (t -> Drive t a) -> t -> Tree a
unfold fn x = case fn x of
Branch a b -> Node (unfold fn a) (unfold fn b)
Unit a -> Leaf a
Stop -> Nil
main = print $ unfold go 5 where
go 0 = Stop
go 1 = Unit 1
go n = Branch (n - 1) (n - 2)
While this seems to work, I'm not sure this is how it is supposed to be. So, that is the question: what is the correct way to do it?
If you think of a datatype as the fixpoint of a functor then you can see that your definition is the sensible generalisation of the list case.
module Unfold where
Here we start by definition the fixpoint of a functor f: it's a layer of f followed by some more fixpoint:
newtype Fix f = InFix { outFix :: f (Fix f) }
To make things slightly clearer, here are the definitions of the functors corresponding to lists and trees. They have basically the same shape as the datatypes except that we have replace the recursive calls by an extra parameter. In other words, they describe what one layer of list / tree looks like and are generic over the possible substructures r.
data ListF a r = LNil | LCons a r
data TreeF a r = TNil | TLeaf a | TBranch r r
Lists and trees are then respectively the fixpoints of ListF and TreeF:
type List a = Fix (ListF a)
type Tree a = Fix (TreeF a)
Anyways, hopping you now have a better intuition about this fixpoint business, we can see that there is a generic way of defining an unfold function for these.
Given an original seed as well as a function taking a seed and building one layer of f where the recursive structure are new seeds, we can build a whole structure:
unfoldFix :: Functor f => (s -> f s) -> s -> Fix f
unfoldFix node = go
where go = InFix . fmap go . node
This definition specialises to the usual unfold on list or your definition for trees. In other words: your definition was indeed the right one.
Good day!
I have a tree of elements:
data Tree a = Node [Tree a]
| Leaf a
I need to get to the leaf of that tree. The path from the root to the leaf is determined by a sequence of switch functions. Each layer in that tree corresponds to a specific switch function that takes something as a parameter and returns an index to the subtree.
class Switchable a where
getIndex :: a -> Int
data SwitchData = forall c. Switchable c => SwitchData c
The final goal is to provide a function that expects all necessary SwitchData and returns
the leaf in a tree.
But I see no way
to enforce the one-to-one mapping at type-level. My current implementation of switch
functions accepts a list of SwitchData instances:
switch :: SwitchData -> Tree a -> Tree a
switch (SwitchData c) (Node vec) = vec `V.unsafeIndex` getIndex c
switch _ _ = error "switch: performing switch on a Leaf of context tree"
switchOnData :: [SwitchData] -> Tree a -> a
switchOnData (x:xs) tree = switchOnData xs $ switch x tree
switchOnData [] (Leaf c) = c
switchOnData [] (Node _) = error "switchOnData: partial data"
The order and the exact types of Switchable instances
are not considered neither at compile time nor at runtime, the correctness is left for a programmer, which bothers me a lot. That gist reflects the current state of affair.
Could you suggest some ways of establishing that one-to-one mapping between layers in a context tree and particular instances of Switchable?
Your current solution is equivalent to simply switchOnIndices :: [Int] -> Tree a -> a (simply apply getIndex before storing in list!). Make this explicitly "partial" by wrapping the return in Maybe and this may actually be the ideal signature, simple and fine.
But apparently, your real use case is more complex; you want to have basically different dictionaries at each level. Then you actually need to link the multiple levels of types to those of tree depths. You're in for some crazy almost-dependent-typed hackery!
{-# LANGUAGE GADTs, TypeOperators, LambdaCase #-}
infixr 5 :&
data Nil = Nil
data s :& ss where
(:&) :: s -> ss -> s :& ss
data Tree switch a where
Node ::
(s -> Tree ss a) -- Such a function is equiv. to `Switchable c => (c, [Tree a])`.
-> Tree (s :& ss) a
Leaf :: a -> Tree Nil a
switch :: s -> Tree (s :& ss) a -> Tree ss a
switch s (Node f) = f s
switchOnData :: s -> Tree s a -> a
switchOnData sw (Node f) = switchOnData ss $ f s
where (s :& ss) = sw
switchOnData _ (Leaf a) = a
data Sign = P | M
signTree :: Tree (Sign :& Sign :& Nil) Char
signTree = Node $ \case P -> Node $ \case P -> Leaf 'a'
M -> Leaf 'b'
M -> Node $ \case P -> Leaf 'c'
M -> Leaf 'd'
testSwitch :: IO()
testSwitch = print $ switchOnData (P :& M :& Nil) signTree
main = testSwitch
Of course, this greatly limits the flexibility of the tree structure: each level has a fixed predetermined number of nodes. In fact, that makes the entire structure of e.g. signTree equivalent to simply (Sign, Sign) -> Char, so unless you really need a tree for some specific reason (e.g. extra information attached to the nodes), why not just use that!
Or, again, the way simpler [Int] -> Tree a -> a signature. But using existentials makes no sense to me at all here.
Let's assume that we're traversing a Tree at a particular Node and have our particular kind of desired polymorphic Switchable value available as well. In other words, we want to take a step
step :: Switchable -> Tree a -> Maybe (Tree a)
Here the output Tree is one of the children of the input Tree and we wrap it in a Maybe just in case something goes wrong. So let's try to write step
step s Leaf{} = Nothing
step s (Node children) = switch s (?extract children)
Here, the challenge arises out of defining ?extract as we need it to polymorphically produce the right kind of value for switch s to operate on. We can't meaningfully use polymorphism to get this to work:
-- won't work!
class Extractable b where
extract :: [Tree a] -> b
since now switch . extract is ambiguous. In fact, if Switchable operates by existentially quantifying a value that it needs passed in then there is no way (outside of Typeable) to build an extract that works properly. Instead, we need each Switchable to have its own individual extract with the proper type, a type that only exists inside of the context of the data type.
{-# LANGUAGE ExistentialQuantification #-}
data Switchable = forall pass . Switchable
{ extract :: [Tree a] -> pass
, switch :: pass -> Int
}
step s (Node children) = children `safeLookup` switch s (extract s children)
But due to the containment of this existential type, we know that there's absolutely no way to use a Switchable except to immediately compute the existentially hidden pass value and then consume it with switch. There's not even any reason to store pass since the only way we can get any information out of it is to use switch.
Altogether this leads to the intuition that Switchable ought to be this instead.
data Switchable = Switchable { switch :: [Tree a] -> Int }
step s (Node children) = children `safeLookup` switch s children
Or even
data Switchable = Switchable { switch :: [Tree a] -> Tree a }
step s (Node children) = Just (switch s children)
I am impatient, looking forward to understanding catamorphism related to this SO question :)
I have only practiced the beginning of Real World Haskell tutorial. So, Maybe I'm gonna ask for way too much right now, if it was the case, just tell me the concepts I should learn.
Below, I quote the wikipedia code sample for catamorphism.
I would like to know your opinion about foldTree below, a way of traversing a Tree, compared to this other SO question and answer, also dealing with traversing a Tree n-ary tree traversal. (independantly from being binary or not, I think the catamorphism below can be written so as to manage n-ary tree)
I put in comment what I understand, and be glad if you could correct me, and clarify some things.
{-this is a binary tree definition-}
data Tree a = Leaf a
| Branch (Tree a) (Tree a)
{-I dont understand the structure between{}
however it defines two morphisms, leaf and branch
leaf take an a and returns an r, branch takes two r and returns an r-}
data TreeAlgebra a r = TreeAlgebra { leaf :: a -> r
, branch :: r -> r -> r }
{- foldTree is a morphism that takes: a TreeAlgebra for Tree a with result r, a Tree a
and returns an r -}
foldTree :: TreeAlgebra a r -> Tree a -> r
foldTree a#(TreeAlgebra {leaf = f}) (Leaf x ) = f x
foldTree a#(TreeAlgebra {branch = g}) (Branch l r) = g (foldTree a l) (foldTree a r)
at this point I am having many difficulties, I seem to guess that the morphism leaf
will be applied to any Leaf
But so as to use this code for real, foldTree needs to be fed a defined TreeAlgebra,
a TreeAlgebra that has a defined morphism leaf so as to do something ?
but in this case in the foldTree code I would expect {f = leaf} and not the contrary
Any clarification from you would be really welcome.
Not exactly sure what you're asking. But yeah, you feed a TreeAlgebra to foldTree corresponding to the computation you want to perform on the tree. For example, to sum all the elements in a tree of Ints you would use this algebra:
sumAlgebra :: TreeAlgebra Int Int
sumAlgebra = TreeAlgebra { leaf = id
, branch = (+) }
Which means, to get the sum of a leaf, apply id (do nothing) to the value in the leaf. To get the sum of a branch, add together the sums of each of the children.
The fact that we can say (+) for branch instead of, say, \x y -> sumTree x + sumTree y is the essential property of the catamorphism. It says that to compute some function f on some recursive data structure it suffices to have the values of f for its immediate children.
Haskell is a pretty unique language in that we can formalize the idea of catamorphism abstractly. Let's make a data type for a single node in your tree, parameterized over its children:
data TreeNode a child
= Leaf a
| Branch child child
See what we did there? We just replaced the recursive children with a type of our choosing. This is so that we can put the subtrees' sums there when we are folding.
Now for the really magical thing. I'm going to write this in pseudohaskell -- writing it in real Haskell is possible, but we have to add some annotations to help the typechecker which can be kind of confusing. We take the "fixed point" of a parameterized data type -- that is, constructing a data type T such that T = TreeNode a T. They call this operator Mu.
type Mu f = f (Mu f)
Look carefully here. The argument to Mu isn't a type, like Int or Foo -> Bar. It's a type constructor like Maybe or TreeNode Int -- the argument to Mu itself takes an argument. (The possibility of abstracting over type constructors is one of the things that makes Haskell's type system really stand out in its expressive power).
So the type Mu f is defined as taking f and filling in its type parameter with Mu f itself. I'm going to define a synonym to reduce some of the noise:
type IntNode = TreeNode Int
Expanding Mu IntNode, we get:
Mu IntNode = IntNode (Mu IntNode)
= Leaf Int | Branch (Mu IntNode) (Mu IntNode)
Do you see how Mu IntNode is equivalent to your Tree Int? We have just torn the recursive structure apart and then used Mu to put it back together again. This gives us the advantage that we can talk about all Mu types at once. This gives us what we need to define a catamorphism.
Let's define:
type IntTree = Mu IntNode
I said the essential property of the catamorphism is that to compute some function f, it suffices to have the values of f for its immediate children. Let's call the type of the thing we are trying to compute r, and the data structure node (IntNode would be a possible instantiation of this). So to compute r on a particular node, we need the node with its children replaced with their rs. This computation has type node r -> r. So a catamorphism says that if we have one of these computations, then we can compute r for the entire recursive structure (remember recursion is denoted explicitly here with Mu):
cata :: (node r -> r) -> Mu node -> r
Making this concrete for our example, this looks like:
cata :: (IntNode r -> r) -> IntTree -> r
Restating, if we can take a node with rs for its children and compute an r, then we can compute an r for an entire tree.
In order to actually compute this, we need node to be a Functor -- that is we need to be able to map an arbitrary function over the children of a node.
fmap :: (a -> b) -> node a -> node b
This can be done straightforwardly for IntNode.
fmap f (Leaf x) = Leaf x -- has no children, so stays the same
fmap f (Branch l r) = Branch (f l) (f r) -- apply function to each child
Now, finally, we can give a definition for cata (the Functor node constraint just says that node has a suitable fmap):
cata :: (Functor node) => (node r -> r) -> Mu node -> r
cata f t = f (fmap (cata f) t)
I used the parameter name t for the mnemonic value of "tree". This is an abstract, dense definition, but it is really very simple. It says: recursively perform cata f -- the computation we are doing over the tree -- on each of t's children (which are themselves Mu nodes) to get a node r, and then pass that result to f compute the result for t itself.
Tying this back to the beginning, the algebra you are defining is essentially a way of defining that node r -> r function. Indeed, given a TreeAlgebra, we can easily get the fold function:
foldFunction :: TreeAlgebra a r -> (TreeNode a r -> r)
foldFunction alg (Leaf a) = leaf alg a
foldFunction alg (Branch l r) = branch alg l r
Thus the tree catamorphism can be defined in terms of our generic one as follows:
type Tree a = Mu (TreeNode a)
treeCata :: TreeAlgebra a r -> (Tree a -> r)
treeCata alg = cata (foldFunction alg)
I'm out of time. I know that got really abstract really fast, but I hope it at least gave you a new viewpoint to help your learning. Good luck!
I think you were were asking a question about the {}'s. There is an earlier question with a good discussion of {}'s. Those are called Haskell's record syntax. The other question is why construct the algebra. This is a typical function paradigm where you generalize data as functions.
The most famous example is Church's construction of the Naturals, where f = + 1 and z = 0,
0 = z,
1 = f z,
2 = f (f z),
3 = f (f (f z)),
etc...
What you are seeing is essentially the same idea being applied to a tree. Work the church example and the tree will click.