How can I have a vector that's strict in its values, like a normal type with bangs (!)? - haskell

Some common performance advice in Haskell is to make fast data structures "spine strict" so that the structure, but not necessarily its contents, is fully evaluated as it is created. This lets us do more work when we insert a value and the structure is in cache as opposed to putting it off until we look a value up.
With a normal data type, like the binary trie from Data.IntMap, this can be accomplished by making the relevant fields in the data structure strict:
data IntMap a = Bin {- ... -} !(IntMap a) !(IntMap a)
| {- ... -}
(Excerpt from the Data.IntMap.Base source.)
How can I achieve the same behavior if I want to store the children in a vector rather than directly as fields of Bin?
data IntMap a = Bin {- ... -} (Vector (IntMap a))
| {- ... -}

First, I'll answer a simple variant of the question:
If your data type is unboxable, e.g. you want a strict vector of Ints,
use Data.Vector.Unboxed.
As a free bonus, the implementation allows you to have "structure of arrays", (Vector a, Vector b), even the interface
is less error-prone "array of structures", Vector (a, b).
See Wikipedia on AOS and SOA.
Yet, in the OPs question, we want to stick IntMap a into Vector, and
IntMap isn't unboxable (or storable or primitive).
The various options boil down to the same idea: you have to seq values yourself.
Whether you go for
Data.Primitive.Array
or implementing own Data.Vector.Strict on top of Data.Vector (note: basicClear can be no-op as
it is for unboxed vectors, or you can use unsafeCoerce () as a dummy value),
you will seq values. This is how
Data.Map.Strict is implemented on top
of the same lazy structure as Data.Map.Lazy.
For example
map
Data.Map.Strict is implemented as:
map :: (a -> b) -> Map k a -> Map k b
map f = go
where
go Tip = Tip
go (Bin sx kx x l r) = let !x' = f x in Bin sx kx x' (go l) (go r)
Compare that to Data.Map.Lazy.map:
map :: (a -> b) -> Map k a -> Map k b
map f = go where
go Tip = Tip
go (Bin sx kx x l r) = Bin sx kx (f x) (go l) (go r)

Related

Avoiding thunks in sparsely evaluated list generated by monadic unfold

I have a simulation library that uses the FFI wrapped in a monad M, carrying a context. All the foreign functions are pure, so I've decided to make the monad lazy, which is normally convenient for flow-control. I represent my simulation as a list of simulation-frames, that I can consume by either writing to a file, or by displaying the frame graphically.
simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame]
simulation [] frame = return [frame]
simulation (step:steps) frame
= step frame >>= fmap (frame:) . simulation steps
Each frame consists of a tuple of newtype-wrapped ForeignPtrs that I can lift to my Haskell representation with
lift :: Frame -> M HFrame
Since the time-steps in my simulation are quite short, I only want to look at every n frames, for which I use
takeEvery n l = foldr cons nil l 0 where
nil _ = []
cons x rest 0 = x : rest n
cons x rest n = rest (n-1)
So my code looks something like
main = consume
$ takeEvery n
$ runM
$ simulation steps initialFrame >>= mapM lift
Now, the problem is that as I increase n, a thunk builds up. I've tried a couple of different ways to try to strictly evaluate each frame in simulation, but I have yet to figure out how to do so. ForeignPtr doesn't appear to have a NFData instance, so I can't use deepseq, but all my attempts with seq, including using seq on each element in the tuple, have been without noticeable effect.
EDIT:
Upon request, I have included more specifics, that I initially excluded since I think they are probably mostly noise for this question.
The monad
newtype FT c a = FT (Context -> a)
instance Functor (FT c) where
fmap f (FT a) = FT (f.a)
instance Applicative (FT c) where
pure a = FT (\_ -> a)
(<*>) (FT a) (FT b) = FT (\c -> a c $ b c)
instance Monad (FT c) where
return = pure
(>>=) (FT a) f = FT (\c -> (\(FT b) -> b c) $ f $ a c)
runFTIn :: Context -> (forall c. FT c a) -> a
runFTIn context (FT a) = a context
runFTWith :: [ContextOption] -> (forall c. FT c a) -> a
runFTWith options a
= unsafePerformIO
$ getContext options >>= \c -> return $ runFTIn c a
runFT = runFTWith []
unsafeLiftFromIO :: (Context -> IO a) -> FT c a
unsafeLiftFromIO a = FT (\c -> unsafePerformIO $ a c)
All the foreign functions are lifted from IO with unsafeLiftFromIO
newtype Box c = Box (ForeignPtr RawBox)
newtype Coordinates c = Coordinates (ForeignPtr RawCoordinates)
type Frame c = (Box c, Coordinates c)
liftBox :: Box c -> FT c HBox
liftCoordinates :: Coordinates c -> FT c HCoordinates
liftFrame (box, coordinates) = do
box' <- liftBox box
coordinates' <- liftCoordinates coordinates
return (box', coordinates')
The steps themselves are supposed to be arbitrary (Frame c -> FT c (Frame c)), so strictness should preferably be in the higher level code.
EDIT2:
I have now tried to use Streamly, however the problem persists, so I think the issue really is finding a way to strictly evaluate ForeignPtrs.
current implementations:
import Streamly
import qualified Streamly.Prelude as S
import qualified Streamly.Internal.Data.Stream.Serial as Serial
takeEvery n = Serial.unfoldrM ((fmap.fmap) (\(h, t) -> (h, S.drop (n-1) t)) . S.uncons)
(#) = flip ($)
simulation
:: (IsStream t)
=> Frame c
-> t (FT c) (Frame c -> FT c (Frame c))
-> t (FT c) (Frame c)
simulation frame = S.scanlM' (#) frame
EDIT3:
To clarify the symptoms and how I have diagnosed the problem.
The library calls OpenCL functions running on a GPU. I am sure that the freeing of the pointers is handled correctly - the ForeignPtrs have the correct freeing functions, and memory use is independent of total number of steps as long as this number is larger than n. What I find is that memory use on the GPU is basically linearly correlated to n. The consumer I've been using for this testing is
import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Put
writeTrajectory fn = fmap (BL.writeFile fn . runPut) . S.foldr ((>>).putFrame) (pure ()) . serially
For my streamly implementation, and
writeTrajectory fn = BL.writeFile fn . runPut . MapM_ putFrame
For the original implementation. Both should consume the stream continuously. I've generated the steps for testing with replicate.
I am unsure of how to more precisely analyze the memory-use on the GPU. System memory use is not an issue here.
Update:
I am starting to think it's not a matter of strictness, but of GC-problems. The run-time system does not know the size of the memory allocated on the GPU and so does not know to collect the pointers, this is less of an issue when there is stuff going on CPU-side as well, as that will produce allocations too, activating the GC. This would explain the slightly non-determinstic memory usage, but linear correlation to n that I've seen. How too solve this nicely is another issue, but I suspect there will be a substantial overhaul to my code.
I think the issue really is finding a way to strictly evaluate ForeignPtrs
If that is really the issue, one way to do that is to change the second clause of simulation:
{-# LANGUAGE BangPatterns #-}
simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame]
simulation [] frame = return [frame]
simulation (step:steps) frame#(!_, !_) -- Evaluate both components of the pair
= step frame >>= fmap (frame:) . simulation steps

Haskell pattern matching on vectors

Is it possible to use list style pattern matching on vectors?
ie
import qualified Data.Vector as V
f :: V.Vector a -> a
f (x:xs) = x
gives an error
-XViewPatterns can let you do this:
{-# LANGUAGE ViewPatterns #-}
module VecViewPats where
import Data.Vector (Vector)
import qualified Data.Vector as V
uncons :: Vector a -> Maybe (a, Vector a)
uncons v = if V.null v
then Nothing
else Just (V.unsafeHead v, V.unsafeTail v)
vsum :: Num a => Vector a -> a
vsum (uncons -> Just (a,av)) = a + vsum av
vsum (uncons -> Nothing) = 0
Or -XLambdaCase
import Control.Category ((>>>))
-- ...
vsum :: Num a => Vector a -> a
vsum = uncons >>> \case
Just (a,av) -> a + vsum av
Nothing -> 0
But honestly, that's seems like a bit of a code smell as you're using one data structure (Vector) as another ([]) which suggests that maybe your choice of data structure is off.
If you really just want to treat it like a list for the purposes of some algorithm, why not use toList?
#Cactus has pointed out -XPatternSynonym (introduced in 7.8) combined with -XViewPattern can be used to pattern match on vectors. I am here to extend his comment a bit further.
pattern Empty <- (V.null -> True)
The above defines a pattern synonym Empty for the empty vector. The Empty pattern matches against an empty vector using view pattern (V.null -> True). However, it cannot be used as an expression elsewhere, i.e. a uni-directional synonym, since the system doesn't really know what Empty is as a Vector (we only know that null v is True but there could be other vectors giving True as well).
To remedy this, a where clause can be added specifying that Empty actually is a empty vector, i.e. a bi-directional synonym, as well as a type signature:
pattern Empty :: Vector a
pattern Empty <- (V.null -> True) where Empty = V.empty
This pattern synonym can be used to define uncons without an if expression:
uncons :: Vector a -> Maybe (a, Vector a)
uncons Empty = Nothing
uncons v = Just (unsafeHead v, unsafeTail v)
We use uncons to define uni-directional synonym. Note I don't make it bi-directional since cons is costly for vector:
pattern (:<|) :: a -> Vector a -> Vector a
pattern x :<| xs <- (uncons -> Just (x, xs))
Anyway, we are finally able to pattern match on vectors just like lists:
vsum :: Num a => Vector a -> a
vsum Empty = 0
vsum (x :<| xs) = x + vsum xs
A complete code is here.
Vectors aren't intended for that kind of pattern matching--they were created to give Haskell O(1) lists, or lists that can be accessed from any point efficiently.
The closest thing to what you wrote would be something like this:
f v = V.head v
Or, if recursion is what you are looking for, the tail function will get the rest of the list.
But if you are trying to do something that moves along a list like that, there are Vector equivalents of functions such as foldl, find, map, and the like. It depends on what you intend to do.

What is the correct definition of `unfold` for an untagged tree?

I've been thinking in how to implement the equivalent of unfold for the following type:
data Tree a = Node (Tree a) (Tree a) | Leaf a | Nil
It was not immediately obvious since the standard unfold for lists returns a value and the next seed. For this datatype, it doesn't make sense, since there is no "value" until you reach a leaf node. This way, it only really makes sense to return new seeds or stop with a value. I'm using this definition:
data Drive s a = Stop | Unit a | Branch s s deriving Show
unfold :: (t -> Drive t a) -> t -> Tree a
unfold fn x = case fn x of
Branch a b -> Node (unfold fn a) (unfold fn b)
Unit a -> Leaf a
Stop -> Nil
main = print $ unfold go 5 where
go 0 = Stop
go 1 = Unit 1
go n = Branch (n - 1) (n - 2)
While this seems to work, I'm not sure this is how it is supposed to be. So, that is the question: what is the correct way to do it?
If you think of a datatype as the fixpoint of a functor then you can see that your definition is the sensible generalisation of the list case.
module Unfold where
Here we start by definition the fixpoint of a functor f: it's a layer of f followed by some more fixpoint:
newtype Fix f = InFix { outFix :: f (Fix f) }
To make things slightly clearer, here are the definitions of the functors corresponding to lists and trees. They have basically the same shape as the datatypes except that we have replace the recursive calls by an extra parameter. In other words, they describe what one layer of list / tree looks like and are generic over the possible substructures r.
data ListF a r = LNil | LCons a r
data TreeF a r = TNil | TLeaf a | TBranch r r
Lists and trees are then respectively the fixpoints of ListF and TreeF:
type List a = Fix (ListF a)
type Tree a = Fix (TreeF a)
Anyways, hopping you now have a better intuition about this fixpoint business, we can see that there is a generic way of defining an unfold function for these.
Given an original seed as well as a function taking a seed and building one layer of f where the recursive structure are new seeds, we can build a whole structure:
unfoldFix :: Functor f => (s -> f s) -> s -> Fix f
unfoldFix node = go
where go = InFix . fmap go . node
This definition specialises to the usual unfold on list or your definition for trees. In other words: your definition was indeed the right one.

Is there an elegant way to have functions return functions of the same type (in a tuple)

I'm using haskell to implement a pattern involving functions that return a value, and themselves (or a function of the same type). Right now I've implemented this like so:
newtype R a = R (a , a -> R a)
-- some toy functions to demonstrate
alpha :: String -> R String
alpha str
| str == reverse str = R (str , omega)
| otherwise = R (reverse str , alpha)
omega :: String -> R String
omega (s:t:r)
| s == t = R (s:t:r , alpha)
| otherwise = R (s:s:t:r , omega)
The driving force for these types of functions is a function called cascade:
cascade :: (a -> R a) -> [a] -> [a]
cascade _ [] = []
cascade f (l:ls) = el : cascade g ls where
R (el , g) = f l
Which takes a seed function and a list, and returns a list created by applying the seed function to the first element of the list, applying the function returned by that to the second element of the list, and so on and so forth.
This works--however, in the process of using this for slightly more useful things, I noticed that a lot of times I had the basic units of which are functions that returned functions other than themselves only rarely; and explicitly declaring a function to return itself was becoming somewhat tedious. I'd rather be able to use something like a Monad's return function, however, I have no idea what bind would do for functions of these types, especially since I never intended these to be linked with anything other than the function they return in the first place.
Trying to shoehorn this into a Monad started worrying me about whether or not what I was doing was useful, so, in short, what I want to know is:
Is what I'm doing a Bad Thing? if not,
Has what I'm doing been done before/am I reinventing the wheel here? if not,
Is there an elegant way to do this, or have I already reached this and am being greedy by wanting some kind of return analogue?
(Incidentally, besides, 'functions that return themeselves' or 'recursive data structure (of functions)', I'm not quite sure what this kind of pattern is called, and has made trying to do effective research in it difficult--if anyone could give me a name for this pattern (if it indeed has one), that alone would be very helpful)
As a high-level consideration, I'd say that your type represents a stateful stream transformer. What's a bit confusing here is that your type is defined as
newtype R a = R (a , a -> R a)
instead of
newtype R a = R (a -> (R a, a))
which would be a bit more natural in the streaming context because you typically don't "produce" something if you haven't received anything yet. Your functions would then have simpler types too:
alpha, omage :: R String
cascade :: R a -> [a] -> [a]
If we try to generalize this idea of a stream transformer, we soon realize that the case where we transform a list of as into a list of as is just a special case. With the proper infrastructure in place we could just as well produce a list of bs. So we try to generalize the type R:
newtype R a b = R (a -> (R a b, b))
I've seen this kind of structure being called a Circuit, which happens to be a full-blown arrow. Arrows are a generalization of the concept of functions and are an even more powerful construct than monads. I can't pretend to understand the category-theoretical background, but it's definitely interesting to play with them. For example, the trivial transformation is just Cat.id:
import Control.Category
import Control.Arrow
import Prelude hiding ((.), id)
import qualified Data.List as L
-- ... Definition of Circuit and instances
cascade :: Circuit a b -> [a] -> [b]
cascade cir = snd . L.mapAccumL unCircuit cir
--
ghci> cascade (Cat.id) [1,2,3,4]
[1,2,3,4]
We can also simulate state by parameterizing the circuit we return as the continuation:
countingCircuit :: (a -> b) -> Circuit a (Int, b)
countingCircuit f = cir 0
where cir i = Circuit $ \x -> (cir (i+1), (i, f x))
--
ghci> cascade (countingCircuit (+5)) [10,3,2,11]
[(0,15),(1,8),(2,7),(3,16)]
And the fact that our circuit type is a category gives us a nice way to compose circuits:
ghci> cascade (countingCircuit (+5) . arr (*2)) [10,3,2,11]
[(0,25),(1,11),(2,9),(3,27)]
It looks like what you have is a simplified version of a stream. That is to
say, a representation of an infinite stream of values. I don't think you can
easily define this as a monad, because you use the same type for your seed as
for your elements, which makes defining fmap difficult (it seems that you
would need to invert the function provided to fmap so as to be able to
recover the seed). You can make this a monad by making the seed type
independent of the element type like so
{-# LANGUAGE ExistentialQuantification #-}
data Stream a = forall s. Stream a s (s -> Stream a)
This will allow you to define a Functor and Monad instance as follows
unfold :: (b -> (a, b)) -> b -> Stream a
unfold f b = Stream a b' (unfold f)
where (a, b') = f b
shead :: Stream a -> a
shead (Stream a _ _) = a
stail :: Stream a -> Stream a
stail (Stream _ b f) = f b
diag :: Stream (Stream a) -> Stream a
diag = unfold f
where f str = (shead $ shead str, stail $ fmap stail str)
sjoin :: Stream (Stream a) -> Stream a
sjoin = diag
instance Functor Stream where
fmap f (Stream a b g) = Stream (f a) b (fmap f . g)
instance Monad Stream where
return = unfold (\x -> (x, x))
xs >>= f = diag $ fmap f xs
Note that this only obeys the Monad laws when viewed as a set, as it does not
preserve element ordering.
This explanation
of the stream monad uses infinite lists, which works just as well in Haskell
since they can be generated in a lazy fashion. If you check out the
documentation for the Stream type in the vector library, you will
find a more complicated version, so that it can be used in efficient stream fusion.
I don't have much to add, except to note that your cascade function can be written as a left fold (and hence also as a right fold, though I haven't done the transformation.)
cascade f = reverse . fst . foldl func ([], f)
where
func (rs,g) s = let R (r,h) = g s in (r:rs,h)

Catamorphism and tree-traversing in Haskell

I am impatient, looking forward to understanding catamorphism related to this SO question :)
I have only practiced the beginning of Real World Haskell tutorial. So, Maybe I'm gonna ask for way too much right now, if it was the case, just tell me the concepts I should learn.
Below, I quote the wikipedia code sample for catamorphism.
I would like to know your opinion about foldTree below, a way of traversing a Tree, compared to this other SO question and answer, also dealing with traversing a Tree n-ary tree traversal. (independantly from being binary or not, I think the catamorphism below can be written so as to manage n-ary tree)
I put in comment what I understand, and be glad if you could correct me, and clarify some things.
{-this is a binary tree definition-}
data Tree a = Leaf a
| Branch (Tree a) (Tree a)
{-I dont understand the structure between{}
however it defines two morphisms, leaf and branch
leaf take an a and returns an r, branch takes two r and returns an r-}
data TreeAlgebra a r = TreeAlgebra { leaf :: a -> r
, branch :: r -> r -> r }
{- foldTree is a morphism that takes: a TreeAlgebra for Tree a with result r, a Tree a
and returns an r -}
foldTree :: TreeAlgebra a r -> Tree a -> r
foldTree a#(TreeAlgebra {leaf = f}) (Leaf x ) = f x
foldTree a#(TreeAlgebra {branch = g}) (Branch l r) = g (foldTree a l) (foldTree a r)
at this point I am having many difficulties, I seem to guess that the morphism leaf
will be applied to any Leaf
But so as to use this code for real, foldTree needs to be fed a defined TreeAlgebra,
a TreeAlgebra that has a defined morphism leaf so as to do something ?
but in this case in the foldTree code I would expect {f = leaf} and not the contrary
Any clarification from you would be really welcome.
Not exactly sure what you're asking. But yeah, you feed a TreeAlgebra to foldTree corresponding to the computation you want to perform on the tree. For example, to sum all the elements in a tree of Ints you would use this algebra:
sumAlgebra :: TreeAlgebra Int Int
sumAlgebra = TreeAlgebra { leaf = id
, branch = (+) }
Which means, to get the sum of a leaf, apply id (do nothing) to the value in the leaf. To get the sum of a branch, add together the sums of each of the children.
The fact that we can say (+) for branch instead of, say, \x y -> sumTree x + sumTree y is the essential property of the catamorphism. It says that to compute some function f on some recursive data structure it suffices to have the values of f for its immediate children.
Haskell is a pretty unique language in that we can formalize the idea of catamorphism abstractly. Let's make a data type for a single node in your tree, parameterized over its children:
data TreeNode a child
= Leaf a
| Branch child child
See what we did there? We just replaced the recursive children with a type of our choosing. This is so that we can put the subtrees' sums there when we are folding.
Now for the really magical thing. I'm going to write this in pseudohaskell -- writing it in real Haskell is possible, but we have to add some annotations to help the typechecker which can be kind of confusing. We take the "fixed point" of a parameterized data type -- that is, constructing a data type T such that T = TreeNode a T. They call this operator Mu.
type Mu f = f (Mu f)
Look carefully here. The argument to Mu isn't a type, like Int or Foo -> Bar. It's a type constructor like Maybe or TreeNode Int -- the argument to Mu itself takes an argument. (The possibility of abstracting over type constructors is one of the things that makes Haskell's type system really stand out in its expressive power).
So the type Mu f is defined as taking f and filling in its type parameter with Mu f itself. I'm going to define a synonym to reduce some of the noise:
type IntNode = TreeNode Int
Expanding Mu IntNode, we get:
Mu IntNode = IntNode (Mu IntNode)
= Leaf Int | Branch (Mu IntNode) (Mu IntNode)
Do you see how Mu IntNode is equivalent to your Tree Int? We have just torn the recursive structure apart and then used Mu to put it back together again. This gives us the advantage that we can talk about all Mu types at once. This gives us what we need to define a catamorphism.
Let's define:
type IntTree = Mu IntNode
I said the essential property of the catamorphism is that to compute some function f, it suffices to have the values of f for its immediate children. Let's call the type of the thing we are trying to compute r, and the data structure node (IntNode would be a possible instantiation of this). So to compute r on a particular node, we need the node with its children replaced with their rs. This computation has type node r -> r. So a catamorphism says that if we have one of these computations, then we can compute r for the entire recursive structure (remember recursion is denoted explicitly here with Mu):
cata :: (node r -> r) -> Mu node -> r
Making this concrete for our example, this looks like:
cata :: (IntNode r -> r) -> IntTree -> r
Restating, if we can take a node with rs for its children and compute an r, then we can compute an r for an entire tree.
In order to actually compute this, we need node to be a Functor -- that is we need to be able to map an arbitrary function over the children of a node.
fmap :: (a -> b) -> node a -> node b
This can be done straightforwardly for IntNode.
fmap f (Leaf x) = Leaf x -- has no children, so stays the same
fmap f (Branch l r) = Branch (f l) (f r) -- apply function to each child
Now, finally, we can give a definition for cata (the Functor node constraint just says that node has a suitable fmap):
cata :: (Functor node) => (node r -> r) -> Mu node -> r
cata f t = f (fmap (cata f) t)
I used the parameter name t for the mnemonic value of "tree". This is an abstract, dense definition, but it is really very simple. It says: recursively perform cata f -- the computation we are doing over the tree -- on each of t's children (which are themselves Mu nodes) to get a node r, and then pass that result to f compute the result for t itself.
Tying this back to the beginning, the algebra you are defining is essentially a way of defining that node r -> r function. Indeed, given a TreeAlgebra, we can easily get the fold function:
foldFunction :: TreeAlgebra a r -> (TreeNode a r -> r)
foldFunction alg (Leaf a) = leaf alg a
foldFunction alg (Branch l r) = branch alg l r
Thus the tree catamorphism can be defined in terms of our generic one as follows:
type Tree a = Mu (TreeNode a)
treeCata :: TreeAlgebra a r -> (Tree a -> r)
treeCata alg = cata (foldFunction alg)
I'm out of time. I know that got really abstract really fast, but I hope it at least gave you a new viewpoint to help your learning. Good luck!
I think you were were asking a question about the {}'s. There is an earlier question with a good discussion of {}'s. Those are called Haskell's record syntax. The other question is why construct the algebra. This is a typical function paradigm where you generalize data as functions.
The most famous example is Church's construction of the Naturals, where f = + 1 and z = 0,
0 = z,
1 = f z,
2 = f (f z),
3 = f (f (f z)),
etc...
What you are seeing is essentially the same idea being applied to a tree. Work the church example and the tree will click.

Resources