Data.Sequence.Seq lazy parallel Functor instance

Data.Sequence.Seq lazy parallel Functor instance - haskell

I need parallel (but lazy) version of fmap for Seq from Data.Sequence package. But package doesn't export any Seq data constructors. So I can't just wrap it in newtype and implement Functor directly for the newtype.
Can I do it without rewriting the whole package?

The best you can do is probably to splitAt the sequence into chunks, fmap over each chunk, and then append the pieces back together. Seq is represented as a finger tree, so its underlying structure isn't particularly well suited to parallel algorithms—if you split it up by its natural structure, successive threads will get larger and larger pieces. If you want to give it a go, you can copy the definition of the FingerTree type from the Data.Sequence source, and use unsafeCoerce to convert between it and a Seq. You'll probably want to send the first few Deep nodes to one thread, but then you'll have to think pretty carefully about the rest. Finger trees can be very far from weight-balanced, primarily because 3^n grows asymptotically faster than 2^n; you'll need to take that into account to balance work among threads.
There are at least two sensible ways to split up the sequence, assuming you use splitAt:
Split it all before breaking the computation into threads. If you do this, you should split it from left to right or right to left, because splitting off small pieces is cheaper than splitting off large ones and then splitting those. You should append the results in a similar fashion.
Split it recursively in multiple threads. This might make sense if you want a lot of pieces or more potential laziness. Split the list near the middle and send each piece to a thread for further splitting and processing.
There's another splitting approach that might be nicer, using the machinery currently used to implement zipWith (see the GitHub ticket I filed requesting chunksOf), but I don't know that you'd get a huge benefit in this application.
The non-strict behavior you seek seems unlikely to work in general. You can probably make it work in many or most specific cases, but I'm not too optimistic that you'll find a totally general approach.

I found a solution, but it's actually not so efficient.
-- | A combination of 'parTraversable' and 'fmap', encapsulating a common pattern:
--
-- > parFmap strat f = withStrategy (parTraversable strat) . fmap f
--
parFmap :: Traversable t => Strategy b -> (a -> b) -> t a -> t b
parFmap strat f = (`using` parTraversable strat) . fmap f
-- | Parallel version of '<$>'
(<$|>) :: Traversable t => (a -> b) -> t a -> t b
(<$|>) = parFmap rpar

Related

Monotonic sequence type in haskell

I would like to define a type for infinite number sequence in haskell. My idea is:
type MySeq = Natural -> Ratio Integer
However, I would also like to be able to define some properties of the sequence on the type level. A simple example would be a non-decreasing sequence like this. Is this possible to do this with current dependent-type capabilities of GHC?
EDIT: I came up with the following idea:
type PositiveSeq = Natural -> Ratio Natural
data IncreasingSeq = IncreasingSeq {
start :: Ratio Natural,
diff :: PositiveSeq}
type IKnowItsIncreasing = [Ratio Natural]
getSeq :: IncreasingSeq -> IKnowItsIncreasing
getSeq s = scanl (+) (start s) [diff s i | i <- [1..]]
Of course, it's basically a hack and not actually type safe at all.

This isn't doing anything very fancy with types, but you could change how you interpret a sequence of naturals to get essentially the same guarantee.
I think you are thinking along the right lines in your edit to the question. Consider
data IncreasingSeq = IncreasingSeq (Integer -> Ratio Natural)
where each ratio represents how much it has increased from the previous number (starting with 0).
Then you can provide a single function
applyToIncreasing :: ([Ratio Natural] -> r) -> IncreasingSeq -> r
applyToIncreasing f (IncreasingSeq s) = f . drop 1 $ scanl (+) 0 (map (s $) [0..])
This should let you deconstruct it in any way, without allowing the function to inspect the real structure.
You just need a way to construct it: probably a fromList that just sorts it and an insert that performs a standard ordered insertion.
It pains part of me to say this, but I don't think you'd gain anything over this using fancy type tricks: there are only three functions that could ever possibly go wrong, and they are fairly simple to correctly implement. The implementation is hidden so anything that uses those is correct as a result of those functions being correct. Just don't export the data constructor for IncreasingSeq.
I would also suggest considering making [Ratio Natural] be the underlying representation. It simplifies things and guarantees that there are no "gaps" in the sequence (so it is guaranteed to be a sequence).
If you want more safety and can take the performance hit, you can use data Nat = Z | S Nat instead of Natural.
I will say that if this was Coq, or a similar language, instead of Haskell I would be more likely to suggest doing some fancier type-level stuff (depending on what you are trying to accomplish) for a couple reasons:
In systems like Coq, you are usually proving theorems about the code. Because of this, it can be useful to have a type-level proof that a certain property holds. Since Haskell doesn't really have a builtin way to prove those sorts of theorems, the utility diminishes.
On the other hand, we can (sometimes) construct data types that essentially must have the properties we want using a small number of trusted functions and a hidden implementation. In the context of a system with more theorem proving capability, like Coq, this might be harder to convince theorem prover of the property than if we used a dependent type (possibly, at least). In Haskell, however, we don't have that issue in the first place.

Are Ana-/Catamorphisms just slower?

After writing this article I decided to put my money where my mouth is and started to convert a previous project of mine to use recursion-schemes.
The data structure in question is a lazy kdtree. Please have a look at the implementations with explicit and implicit recursion.
This is mostly a straightforward conversion along the lines of:
data KDTree v a = Node a (Node v a) (Node v a) | Leaf v a
to
data KDTreeF v a f = NodeF a f f | Leaf v a
Now after benchmarking the whole shebang I find that the KDTreeF version is about two times slower than the normal version (find the whole run here).
Is it just the additional Fix wrapper that slows me down here? And is there anything I could do against this?
Caveats:
At the moment this is specialized to (V3 Double).
This is cata- after anamorphism application. Hylomorphism isn't suitable for kdtrees.
I use cata (fmap foo algebra) several times. Is this good practice?
I use Edwards recursion-schemes package.
Edit 1:
Is this related? https://ghc.haskell.org/trac/ghc/wiki/NewtypeWrappers
Is newtype Fix f = Fix (f (Fix f)) not "free"?
Edit 2:
Just did another bunch of benchmarks. This time I tested tree construction and deconstruction. Benchmark here: https://dl.dropboxusercontent.com/u/2359191/2014-05-15-kdtree-bench-03.html
While the Core output indicates that intermediate data structures are not removed completely and it is not surprising that the linear searches dominate now, the KDTreeFs now are slightly faster than the KDTrees. Doesn't matter much though.

I have just implemented the Thing + ThingF + Base instance variant of the tree. And guess what ... this one is amazingly fast.
I was under the impression that this one would be the slowest of all variants. I really should have read my own post ... the line where I write:
there is no trace of the TreeF structure to be found
Let the numbers speak for themselves, kdtreeu is the new variant. The results are not always as clear as for these cases, but in most cases they are at least as fast as the explicit recursion (kdtree in the benchmark).

I wasn't using recursion schemes, but rather my own "hand-rolled" cata, ana, Fix/unFix to do generation of (lists of) and evaluation of programs in a small language in the hope of finding one that matched a list of (input, output) pairs.
In my experience, cata optimized better than direct recursion and gave a speed boost. Also IME, ana prevented stack overflow errors that my naive generator was causing, but that make have centered around generation of the final list.
So, my answer would be that no, they aren't always slower, but I don't see any obvious problems; so they may simply be slower in your case. It's also possible that recursion-schemes itself is just not optimized for speed.

ST Monad == code smell?

I'm working on implementing the UCT algorithm in Haskell, which requires a fair amount of data juggling. Without getting into too much detail, it's a simulation algorithm where, at each "step," a leaf node in the search tree is selected based on some statistical properties, a new child node is constructed at that leaf, and the stats corresponding to the new leaf and all of its ancestors are updated.
Given all that juggling, I'm not really sharp enough to figure out how to make the whole search tree a nice immutable data structure à la Okasaki. Instead, I've been playing around with the ST monad a bit, creating structures composed of mutable STRefs. A contrived example (unrelated to UCT):
import Control.Monad
import Control.Monad.ST
import Data.STRef
data STRefPair s a b = STRefPair { left :: STRef s a, right :: STRef s b }
mkStRefPair :: a -> b -> ST s (STRefPair s a b)
mkStRefPair a b = do
a' <- newSTRef a
b' <- newSTRef b
return $ STRefPair a' b'
derp :: (Num a, Num b) => STRefPair s a b -> ST s ()
derp p = do
modifySTRef (left p) (\x -> x + 1)
modifySTRef (right p) (\x -> x - 1)
herp :: (Num a, Num b) => (a, b)
herp = runST $ do
p <- mkStRefPair 0 0
replicateM_ 10 $ derp p
a <- readSTRef $ left p
b <- readSTRef $ right p
return (a, b)
main = print herp -- should print (10, -10)
Obviously this particular example would be much easier to write without using ST, but hopefully it's clear where I'm going with this... if I were to apply this sort of style to my UCT use case, is that wrong-headed?
Somebody asked a similar question here a couple years back, but I think my question is a bit different... I have no problem using monads to encapsulate mutable state when appropriate, but it's that "when appropriate" clause that gets me. I'm worried that I'm reverting to an object-oriented mindset prematurely, where I have a bunch of objects with getters and setters. Not exactly idiomatic Haskell...
On the other hand, if it is a reasonable coding style for some set of problems, I guess my question becomes: are there any well-known ways to keep this kind of code readable and maintainable? I'm sort of grossed out by all the explicit reads and writes, and especially grossed out by having to translate from my STRef-based structures inside the ST monad to isomorphic but immutable structures outside.

I don't use ST much, but sometimes it is just the best solution. This can be in many scenarios:
There are already well-known, efficient ways to solve a problem. Quicksort is a perfect example of this. It is known for its speed and in-place behavior, which cannot be imitated by pure code very well.
You need rigid time and space bounds. Especially with lazy evaluation (and Haskell doesn't even specify whether there is lazy evaluation, just that it is non-strict), the behavior of your programs can be very unpredictable. Whether there is a memory leak could depend on whether a certain optimization is enabled. This is very different from imperative code, which has a fixed set of variables (usually) and defined evaluation order.
You've got a deadline. Although the pure style is almost always better practice and cleaner code, if you are used to writing imperatively and need the code soon, starting imperative and moving to functional later is a perfectly reasonable choice.
When I do use ST (and other monads), I try to follow these general guidelines:
Use Applicative style often. This makes the code easier to read and, if you do switch to an immutable version, much easier to convert. Not only that, but Applicative style is much more compact.
Don't just use ST. If you program only in ST, the result will be no better than a huge C program, possibly worse because of the explicit reads and writes. Instead, intersperse pure Haskell code where it applies. I often find myself using things like STRef s (Map k [v]). The map itself is being mutated, but much of the heavy lifting is done purely.
Don't remake libraries if you don't have to. A lot of code written for IO can be cleanly, and fairly mechanically, converted to ST. Replacing all the IORefs with STRefs and IOs with STs in Data.HashTable was much easier than writing a hand-coded hash table implementation would have been, and probably faster too.
One last note - if you are having trouble with the explicit reads and writes, there are ways around it.

Algorithms which make use of mutation and algorithms which do not are different algorithms. Sometimes there is a strightforward bounds-preserving translation from the former to the latter, sometimes a difficult one, and sometimes only one which does not preserve complexity bounds.
A skim of the paper reveals to me that I don't think it makes essential use of mutation -- and so I think a potentially really nifty lazy functional algorithm could be developed. But it would be a different but related algorithm to that described.
Below, I describe one such approach -- not necessarily the best or most clever, but pretty straightforward:
Here's the setup a I understand it -- A) a branching tree is constructed B) payoffs are then pushed back from the leafs to the root which then indicates the best choice at any given step. But this is expensive, so instead, only portions of the tree are explored to the leafs in a nondeterministic manner. Furthermore, each further exploration of the tree is determined by what's been learned in previous explorations.
So we build code to describe the "stage-wise" tree. Then, we have another data structure to define a partially explored tree along with partial reward estimates. We then have a function of randseed -> ptree -> ptree that given a random seed and a partially explored tree, embarks on one further exploration of the tree, updating the ptree structure as we go. Then, we can just iterate this function over an empty seed ptree to get a list of increasingly more sampled spaces in the ptree. We then can walk this list until some specified cutoff condition is met.
So now we've gone from one algorithm where everything is blended together to three distinct steps -- 1) building the whole state tree, lazily, 2) updating some partial exploration with some sampling of a structure and 3) deciding when we've gathered enough samples.

It's can be really difficult to tell when using ST is appropriate. I would suggest you do it with ST and without ST (not necessarily in that order). Keep the non-ST version simple; using ST should be seen as an optimization, and you don't want to do that until you know you need it.

I have to admit that I cannot read the Haskell code. But if you use ST for mutating the tree, then you can probably replace this with an immutable tree without losing much because:
Same complexity for mutable and immutable tree
You have to mutate every node above the new leaf. An immutable tree has to replace all nodes above the modified node. So in both cases the touched nodes are the same, thus you don't gain anything in complexity.
For e.g. Java object creation is more expensive than mutation, so maybe you can gain a bit here in Haskell by using mutation. But this I don't know for sure. But a small gain does not buy you much because of the next point.
Updating the tree is presumably not the bottleneck
The evaluation of the new leaf will probably be much more expensive than updating the tree. At least this is the case for UCT in computer Go.

Use of the ST monad is usually (but not always) as an optimization. For any optimization, I apply the same procedure:
Write the code without it,
Profile and identify bottlenecks,
Incrementally rewrite the bottlenecks and test for improvements/regressions,
The other use case I know of is as an alternative to the state monad. The key difference being that with the state monad the type of all of the data stored is specified in a top-down way, whereas with the ST monad it is specified bottom-up. There are cases where this is useful.

Haskell multiple functors

I'm making a fibonacci heap implementation in Haskell, and I'm not sure exactly what the clean way to do it.
For example, I want to order the nodes. So I can do something like:
instance Ord (FibNode e) where
f1 `compare` f2 = (key f1) `compare` (key f2)
This would be more easily done if I made FibNode a monad. But other times I want to fold across the node's siblings, or fold across their children etc. So defining a functor where f x = f $ key x won't work all the time.
Apart from defining my own fmapKey, fmapSibs, fmapKids... is there a way to do this?

You can't make a type constructor an instance of Functor in more than one way.
But you can make various newtype wrappers around your type that each has its own Functor instance. But that's not really any more convenient than defining your own fmap functions.

The (Fibonacci) heap is an inherently impure, destructively-updated data structure with specific asymptotic runtime guarantees for various operations on it. I would be surprised if you can maintain these guarantees with a straightforward translation to a pure version. My suggestion would be to to make it an impure data structure using something like the ST or IO arrays. This would make for a much more direct implementation of the classic algorithms.

O(1) circular buffer in haskell?

I'm working on a small concept project in Haskell which requires a circular buffer. I've managed to create a buffer using arrays which has O(1) rotation, but of course requires O(N) for insertion/deletion. I've found an implementation using lists which appears to take O(1) for insertion and deletion, but since it maintains a left and right list, crossing a certain border when rotating will take O(N) time. In an imperative language, I could implement a doubly linked circular buffer with O(1) insertion, deletion, and rotation. I'm thinking this isn't possible in a purely functional language like Haskell, but I'd love to know if I'm wrong.

If you can deal with amortized O(1) operations, you could probably use either Data.Sequence from the containers package, or Data.Dequeue from the dequeue package. The former uses finger trees, while the latter uses the "Banker's Dequeue" from Okasaki's Purely Functional Data Structures (a prior version online here).

The ST monad allows to describe and execute imperative algorithms in Haskell. You can use STRefs for the mutable pointers of your doubly linked list.
Self-contained algorithms described using ST are executed using runST. Different runST executions may not share ST data structures (STRef, STArray, ..).
If the algorithm is not "self contained" and the data structure is required to be maintained with IO operations performed in between its uses, you can use stToIO to access it in the IO monad.
Regarding whether this is purely functional or not - I guess it's not?

It sounds like you might need something a bit more complicated than this (since you mentioned doubly-linked lists), but maybe this will help. This function acts like map over a mutable cyclic list:
mapOnCycling f = concat . tail . iterate (map f)
Use like:
*Main> (+1) `mapOnCycling` [3,2,1]
[4,3,2,5,4,3,6,5,4,7,6,5,8,7,6,9,8,7,10,9...]
And here's one that acts like mapAccumL:
mapAccumLOnCycling f acc xs =
let (acc', xs') = mapAccumL f acc xs
in xs' ++ mapAccumLOnCycling f acc' xs'
Anyway, if you care to elaborate even more on what exactly your data structure needs to be able to "do" I would be really interested in hearing about it.
EDIT: as camccann mentioned, you can use Data.Sequence for this, which according to the docs should give you O1 time complexity (is there such a thing as O1 amortized time?) for viewing or adding elements both to the left and right sides of the sequence, as well as modifying the ends along the way. Whether this will have the performance you need, I'm not sure.
You can treat the "current location" as the left end of the Sequence. Here we shuttle back and forth along a sequence, producing an infinite list of values. Sorry if it doesn't compile, I don't have GHC at the moment:
shuttle (viewl-> a <: as) = a : shuttle $ rotate (a+1 <| as)
where rotate | even a = rotateForward
| otherwise = rotateBack
rotateBack (viewr-> as' :> a') = a' <| as'
rotateForward (viewl-> a' <: as') = as' |> a'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string