Apply function to all pairs efficiently - haskell

I need a second order function pairApply that applies a binary function f to all unique pairs of a list-like structure and then combines them somehow. An example / sketch:
pairApply (+) f [a, b, c] = f a b + f a c + f b c
Some research leads me to believe that Data.Vector.Unboxed probably will have good performance (I will also need fast access to specific elements); also it necessary for Statistics.Sample, which would come in handy further down the line.
With this in mind I have the following, which almost compiles:
import qualified Data.Vector.Unboxed as U      
pairElement :: (U.Unbox a, U.Unbox b)    
=> (U.Vector a)                    
  -> (a -> a -> b)                   
  -> Int                             
-> a                               
 -> (U.Vector b)                    
pairElement v f idx el =
U.map (f el) $ U.drop (idx + 1) v            
pairUp :: (U.Unbox a, U.Unbox b)   
=> (a -> a -> b)                        
 -> (U.Vector a)                         
-> (U.Vector (U.Vector b))
pairUp f v = U.imap (pairElement v f) v 
pairApply :: (U.Unbox a, U.Unbox b)
=> (b -> b -> b)                     
-> b                                 
 -> (a -> a -> b)                     
-> (U.Vector a)                      
 -> b
pairApply combine neutral f v =
folder $ U.map folder (pairUp f v) where
folder = U.foldl combine neutral
The reason this doesn't compile is that there is no Unboxed instance of a U.Vector (U.Vector a)). I have been able to create new unboxed instances in other cases using Data.Vector.Unboxed.Deriving, but I'm not sure it would be so easy in this case (transform it to a tuple pair where the first element is all the inner vectors concatenated and the second is the length of the vectors, to know how to unpack?)
My question can be stated in two parts:
Does the above implementation make sense at all or is there some quick library function magic etc that could do it much easier?
If so, is there a better way to make an unboxed vector of vectors than the one sketched above?
Note that I'm aware that foldl is probably not the best choice; once I've got the implementation sorted I plan to benchmark with a few different folds.

There is no way to define a classical instance for Unbox (U.Vector b), because that would require preallocating a memory area in which each element (i.e. each subvector!) has the same fixed amount of space. But in general, each of them may be arbitrarily big, so that's not feasible at all.
It might in principle be possible to define that instance by storing only a flattened form of the nested vector plus an extra array of indices (where each subvector starts). I once briefly gave this a try; it actually seems somewhat promising as far as immutable vectors are concerned, but a G.Vector instance also requires a mutable implementation, and that's hopeless for such an approach (because any mutation that changes the number of elements in one subvector would require shifting everything behind it).
Usually, it's just not worth it, because if the individual element vectors aren't very small the overhead of boxing them won't matter, i.e. often it makes sense to use B.Vector (U.Vector b).
For your application however, I would not do that at all – there's no need to ever wrap the upper element-choices in a single triangular array. (And it would be really bad for performance to do that, because it make the algorithm take O (n²) memory rather than O (n) which is all that's needed.)
I would just do the following:
pairApply combine neutral f v
= U.ifoldl' (\acc i p -> U.foldl' (\acc' q -> combine acc' $ f p q)
acc
(U.drop (i+1) v) )
neutral v
This corresponds pretty much to the obvious nested-loops imperative implementation
pairApply(combine, b, f, v):
for(i in 0..length(v)-1):
for(j in i+1..length(v)-1):
b = combine(b, f(v[i], v[j]);
return b;

My answer is basically the same as leftaroundabout's nested-loops imperative implementation:
pairApply :: (Int -> Int -> Int) -> Vector Int -> Int
pairApply f v = foldl' (+) 0 [f (v ! i) (v ! j) | i <- [0..(n-1)], j <- [(i+1)..(n-1)]]
where n = length v
As far as I know, I do not see any performance issue with this implementation.
Non-polymorphic for simplicity.

Related

Sharing vs. non-sharing fixed-point combinator

This is the usual definition of the fixed-point combinator in Haskell:
fix :: (a -> a) -> a
fix f = let x = f x in x
On https://wiki.haskell.org/Prime_numbers, they define a different fixed-point combinator:
_Y :: (t -> t) -> t
_Y g = g (_Y g) -- multistage, non-sharing, g (g (g (g ...)))
-- g (let x = g x in x) -- two g stages, sharing
_Y is a non-sharing fixpoint combinator, here arranging for a recursive "telescoping" multistage primes production (a tower of producers).
What exactly does this mean? What is "sharing" vs. "non-sharing" in that context? How does _Y differ from fix?
"Sharing" means f x re-uses the x that it creates; but with _Y g = g . g . g . g . ..., each g calculates its output anew (cf. this and this).
In that context, the sharing version has much worse memory usage, leads to a space leak.1
The definition of _Y mirrors the usual lambda calculus definition's effect for the Y combinator, which emulates recursion by duplication, while true recursion refers to the same (hence, shared) entity.
In
x = f x
(_Y g) = g (_Y g)
both xs refer to the same entity, but each of (_Y g)s refer to equivalent, but separate, entity. That's the intention of it, anyway.
Of course thanks to referential transparency there's no guarantee in Haskell the language for any of this. But GHC the compiler does behave this way.
_Y g is a common sub-expression and it could be "eliminated" by a compiler by giving it a name and reusing that named entity, subverting the whole purpose of it. That's why the GHC has the "no common sub-expressions elimination" -fno-cse flag which prevents this explicitly. It used to be that you had to use this flag to achieve the desired behaviour here, but not anymore. GHC won't be as aggressive at common sub-expressions elimination anymore, with the more recent (read: several years now) versions.
disclaimer: I'm the author of that part of the page you're referring to. Was hoping for the back-and-forth that's usual on wiki pages, but it never came, so my work didn't get reviewed like that. Either no-one bothered, or it is passable (lacking major errors). The wiki seems to be largely abandoned for many years now.
1 The g function involved,
(3:) . minus [5,7..] . foldr (\ (x:xs) ⟶ (x:) . union xs) []
. map (\ p ⟶ [p², p² + 2p..])
produces an increasing stream of all odd primes given an increasing stream of all odd primes. To produce a prime N in value, it consumes its input stream up to the first prime above sqrt(N) in value, at least. Thus the production points are given roughly by repeated squaring, and there are ~ log (log N) of such g functions in total in the chain (or "tower") of these primes producers, each immediately garbage collectible, the lowest one producing its primes given just the first odd prime, 3, known a priori.
And with the two-staged _Y2 g = g x where { x = g x } there would be only two of them in the chain, but only the top one would be immediately garbage collectible, as discussed at the referenced link above.
_Y is translated to the following STG:
_Y f = let x = _Y f in f x
fix is translated identically to the Haskell source:
fix f = let x = f x in x
So fix f sets up a recursive thunk x and returns it, while _Y is a recursive function, and importantly it’s not tail-recursive. Forcing _Y f enters f, passing a new call to _Y f as an argument, so each recursive call sets up a new thunk; forcing the x returned by fix f enters f, passing x itself as an argument, so each recursive call is into the same thunk—this is what’s meant by “sharing”.
The sharing version usually has better memory usage, and also lets the GHC RTS detect some kinds of infinite loop. When a thunk is forced, before evaluation starts, it’s replaced with a “black hole”; if at any point during evaluation of a thunk a black hole is reached from the same thread, then we know we have an infinite loop and can throw an exception (which you may have seen displayed as Exception: <<loop>>).
I think you already received excellent answers, from a GHC/Haskell perspective. I just wanted to chime in and add a few historical/theoretical notes.
The correspondence between unfolding and cyclic views of recursion is rigorously studied in Hasegawa's PhD thesis: https://www.springer.com/us/book/9781447112211
(Here's a shorter paper that you can read without paying Springer: https://link.springer.com/content/pdf/10.1007%2F3-540-62688-3_37.pdf)
Hasegawa assumes a traced monoidal category, a requirement that is much less stringent than the usual PCPO assumption of domain theory, which forms the basis of how we think about Haskell in general. What Hasegawa showed was that one can define these "sharing" fixed point operators in such a setting, and established that they correspond to the usual unfolding view of fixed points from Church's lambda-calculus. That is, there is no way to tell them apart by making them produce different answers.
Hasegawa's correspondence holds for what's known as central arrows; i.e., when there are no "effects" involved. Later on, Benton and Hyland extended this work and showed that the correspondence holds for certain cases when the underlying arrow can perform "mild" monadic effects as well: https://pdfs.semanticscholar.org/7b5c/8ed42a65dbd37355088df9dde122efc9653d.pdf
Unfortunately, Benton and Hyland only allow effects that are quite "mild": Effects like the state and environment monads fit the bill, but not general effects like exceptions, lists, or IO. (The fixed point operators for these effectful computations are known as mfix in Haskell, with the type signature (a -> m a) -> m a, and they form the basis of the recursive-do notation.)
It's still an open question how to extend this work to cover arbitrary monadic effects. Though it doesn't seem to be receiving much attention these days. (Would make a great PhD topic for those interested in the correspondence between lambda-calculus, monadic effects, and graph-based computations.)

what's the relationship between Monad and single threaded?

When I study Monod I want to know what path is best for understanding Haskell's Monad. Many people such as Bartosz Milewski proposed that Monads for functional programming is the best material. After reading a part of this paper I got the same feeling, but in 4.2 Array transforms, I have no idea how to understand the summary about Monad as I miss some foundation in the bottom part of page 16:
"Making M into an abstract data type guarantees that single threading is
preserved, and hence it is safe to implement assignment with an in-place update.
The use of data abstraction is essential for this purpose. Otherwise, one could
write programs such as (\x -> (assign i v x ; assign i w x )) that violate the single threading property."
I don't know why Philip Wadler discuss single threading here? data M a = State -> (a, State) must be very important for guaranteeing single threading, why?
For that I implement the code of this section 4.2 Array transforms, where I assume that my Array is like Arr [("ok", 0), ("no", 1)], and index is string, value is Int:
type M a = State -> (a, State)
data Arr = Arr [(Id, Val)] deriving (Show)
type State = Arr
type Id = String
type Val = Int
type Ix = Id
update ix val arr = updateNew ix val arr (Arr [])
where updateNew ix val (Arr (x:xs)) (Arr newArr) =
case (fst x) == ix of
True -> Arr (newArr ++ ((ix,val):xs))
False -> updateNew ix val (Arr xs) (Arr (newArr ++ [x]))
assign :: Ix -> Val -> M ()
assign i v = \x -> ((), update i v x)
But this is not helpful for me to understand the above summary. I hope one enthusiastic person to explain more about it!
In Haskell, something like [("ok", 0), ("no", 1)] is not an array*, but rather a list. Haskell lists are immutable, so you don't even have to think about them changing. Arrays are another story. There are actually two very different things, both called arrays: immutable arrays and mutable arrays.
Immutable arrays are just alternative representations of certain sorts of functions along with some information about their domains.
Wadler is discussing mutable arrays, which can actually be changed. We don't actually handle these arrays directly; rather, we deal with values that serve as pointers to them. In languages like ML, Java, C, etc., you can "follow" a pointer any time you have one to access or modify the value(s) it points to. But that would completely break Haskell's referential transparency, which is critical to both understanding and optimizing it.
So what we do instead is encapsulate the changes to an array within an abstract monad. All sorts of things are going on under the hood that break the rules, but what gets exposed to you, the programmer, is guaranteed to make sense. There are actually two monads that can support mutable arrays in GHC: IO and ST s. ST s lets you, in a pure function, make an array, mutate it all sorts of ways, and then produce a pure result. IO, on the other hand, lets you intermix array creation and modifications with other IO actions.
* In GHC, it might be an array, because GHC offers an extension called OverloadedLists, but even in GHC it's very unlikely to be an array.

Is there a way to avoid copying the whole search path of a binary tree on insert?

I've just started working my way through Okasaki's Purely Functional Data Structures, but have been doing things in Haskell rather than Standard ML. However, I've come across an early exercise (2.5) that's left me a bit stumped on how to do things in Haskell:
Inserting an existing element into a binary search tree copies the entire search path
even though the copied nodes are indistinguishable from the originals. Rewrite insert using exceptions to avoid this copying. Establish only one handler per insertion rather than one handler per iteration.
Now, my understanding is that ML, being an impure language, gets by with a conventional approach to exception handling not so different to, say, Java's, so you can accomplish it something like this:
type Tree = E | T of Tree * int * Tree
exception ElementPresent
fun insert (x, t) =
let fun go E = T (E, x, E)
fun go T(l, y, r) =
if x < y then T(go (l), x, r)
else if y < x then T(l, x, go (r))
else raise ElementPresent
in go t
end
handle ElementPresent => t
I don't have an ML implementation, so this may not be quite right in terms of the syntax.
My issue is that I have no idea how this can be done in Haskell, outside of doing everything in the IO monad, which seems like cheating and even if it's not cheating, would seriously limit the usefulness of a function which really doesn't do any mutation. I could use the Maybe monad:
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x t = maybe t id (go t)
where go Empty = return (Fork Empty x Empty)
go (Fork l y r)
| x < y = do l' <- go l; return (Fork l' y r)
| x > y = do r' <- go r; return (Fork l y r')
| otherwise = Nothing
This means everything winds up wrapped in Just on the way back up when the element isn't found, which requires more heap allocation, and sort of defeats the purpose. Is this allocation just the price of purity?
EDIT to add: A lot of why I'm wondering about the suitability of the Maybe solution is that the optimization described only seems to save you all the constructor calls you would need in the case where the element already exists, which means heap allocations proportional to the length of the search path. The Maybe also avoids those constructor calls when the element already exists, but then you get a number of Just constructor calls equal to the length of the search path. I understand that a sufficiently smart compiler could elide all the Just allocations, but I don't know if, say, the current version of GHC is really that smart.
In terms of cost, the ML version is actually very similar to your Haskell version.
Every recursive call in the ML version results in a stack frame. The same is true in the
Haskell version. This is going to be proportional in size to the path that you traverse in
the tree. Also, both versions will of course allocate new nodes for the entire path if an insertion is actually performed.
In your Haskell version, every recursive call might also eventually result in the
allocation of a Just node. This will go on the minor heap, which is just a block of
memory with a bump pointer. For all practical purposes, GHC's minor heap is roughly equivalent in
cost to the stack. Since these are short-lived allocations, they won't normally end up
being moved to the major heap at all.
GHC generally cannot elide path copying in cases like that. However, there is a way to do it manually, without incurring any of the indirection/allocation costs of Maybe. Here it is:
{-# LANGUAGE MagicHash #-}
import GHC.Prim (reallyUnsafePtrEquality#)
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x Empty = Fork Empty x Empty
insert x node#(Fork l y r)
| x < y = let l' = insert x l in
case reallyUnsafePtrEquality# l l' of
1# -> node
_ -> Fork l' y r
| x > y = let r' = insert x r in
case reallyUnsafePtrEquality# r r' of
1# -> node
_ -> Fork l y r'
| otherwise = node
The pointer equality function does exactly what's in the name. Here it is safe because even if the equality returns a false negative we only do a bit of extra copying, and nothing worse happens.
It's not the most idiomatic or prettiest Haskell, but the performance benefits can be significant. In fact, this trick is used very frequently in unordered-containers.
As fizruk indicates, the Maybe approach is not significantly different from what you'd get in Standard ML. Yes, the whole path is copied, but the new copy is discarded if it turns out not to be needed. The Just constructor itself may not even be allocated on the heap—it can't escape from insert, let alone the module, and you don't do anything weird with it, so the compiler is free to analyze it to death.
Edit
There are efficiency problems, now that I think of it. Your use of Maybe conceals the fact that you're actually making two passes—one down to find the insertion point and one up to build the tree. The solution to this is to drop Maybe Tree in favor of (Tree,Bool) and use strictness annotations, or to switch to continuation-passing style. Also, if you choose to stay with the three-way logic, you may want to use the three-way comparison function. Alternatively, you can go all the way to the bottom each time and check later if you hit a duplicate.
If you have a predicate that checks whether the key is already in the tree, you can look before you leap:
insert x t = if contains t x then t else insert' x t
This traverses the tree twice, of course. Whether that's as bad as it sounds should be determined empirically: it might just load the relevant part of the tree into the cache.

Memoizing multiplication

My application multiplies vectors after a (costly) conversion using an FFT. As a result, when I write
f :: (Num a) => a -> [a] -> [a]
f c xs = map (c*) xs
I only want to compute the FFT of c once, rather than for every element of xs. There really isn't any need to store the FFT of c for the entire program, just in the local scope.
I attempted to define my Num instance like:
data Foo = Scalar c
| Vec Bool v -- the bool indicates which domain v is in
instance Num Foo where
(*) (Scalar c) = \x -> case x of
Scalar d -> Scalar (c*d)
Vec b v-> Vec b $ map (c*) v
(*) v1 = let Vec True v = fft v1
in \x -> case x of
Scalar d -> Vec True $ map (c*) v
v2 -> Vec True $ zipWith (*) v (fft v2)
Then, in an application, I call a function similar to f (which works on arbitrary Nums) where c=Vec False v, and I expected that this would be just as fast as if I hack f to:
g :: Foo -> [Foo] -> [Foo]
g c xs = let c' = fft c
in map (c'*) xs
The function g makes the memoization of fft c occur, and is much faster than calling f (no matter how I define (*)). I don't understand what is going wrong with f. Is it my definition of (*) in the Num instance? Does it have something to do with f working over all Nums, and GHC therefore being unable to figure out how to partially compute (*)?
Note: I checked the core output for my Num instance, and (*) is indeed represented as nested lambdas with the FFT conversion in the top level lambda. So it looks like this is at least capable of being memoized. I have also tried both judicious and reckless use of bang patterns to attempt to force evaluation to no effect.
As a side note, even if I can figure out how to make (*) memoize its first argument, there is still another problem with how it is defined: A programmer wanting to use the Foo data type has to know about this memoization capability. If she wrote
map (*c) xs
no memoization would occur. (It must be written as (map (c*) xs)) Now that I think about it, I'm not entirely sure how GHC would rewrite the (*c) version since I have curried (*). But I did a quick test to verify that both (*c) and (c*) work as expected: (c*) makes c the first arg to *, while (*c) makes c the second arg to *. So the problem is that it is not obvious how one should write the multiplication to ensure memoization. Is this just an inherent downside to the infix notation (and the implicit assumption that the arguments to * are symmetric)?
The second, less pressing issue is that the case where we map (v*) onto a list of scalars. In this case, (hopefully) the fft of v would be computed and stored, even though it is unnecessary since the other multiplicand is a scalar. Is there any way around this?
Thanks
I believe stable-memo package could solve your problem. It memoizes values not using equality but by reference identity:
Whereas most memo combinators memoize based on equality, stable-memo does it based on whether the exact same argument has been passed to the function before (that is, is the same argument in memory).
And it automatically drops memoized values when their keys are garbage collected:
stable-memo doesn't retain the keys it has seen so far, which allows them to be garbage collected if they will no longer be used. Finalizers are put in place to remove the corresponding entries from the memo table if this happens.
So if you define something like
fft = memo fft'
where fft' = ... -- your old definition
you'll get pretty much what you need: Calling map (c *) xs will memoize the computation of fft inside the first call to (*) and it gets reused on subsequent calls to (c *). And if c is garbage collected, so is fft' c.
See also this answer to How to add fields that only cache something to ADT?
I can see two problems that might prevent memoization:
First, f has an overloaded type and works for all Num instances. So f cannot use memoization unless it is either specialized (which usually requires a SPECIALIZE pragma) or inlined (which may happen automatically, but is more reliable with an INLINE pragma).
Second, the definition of (*) for Foo performs pattern matching on the first argument, but f multiplies with an unknown c. So within f, even if specialized, no memoization can occur. Once again, it very much depends on f being inlined, and a concrete argument for c to be supplied, so that inlining can actually appear.
So I think it'd help to see how exactly you're calling f. Note that if f is defined using two arguments, it has to be given two arguments, otherwise it cannot be inlined. It would furthermore help to see the actual definition of Foo, as the one you are giving mentions c and v which aren't in scope.

Haskell monad return arbitrary data type

I am having trouble defining the return over a custom defined recursive data type.
The data type is as follows:
data A a = B a | C (A a) (A a)
However, I don't know how to define the return statement since I can't figure out when to return B value and when to recursively return C.
Any help is appreciated!
One way to define a Monad instance for this type is to treat it as a free monad. In effect, this takes A a to be a little syntax with one binary operator C, and variables represented by values of type a embedded by the B constructor. That makes return the B constructor, embedding variables, and >>= the operator which performs subsitution.
instance Monad A where
return = B
B x >>= f = f x
C l r >>= f = C (l >>= f) (r >>= f)
It's not hard to see that (>>= B) performs the identity substitution, and that composition of substitutions is associative.
Another, more "imperative" way to see this monad is that it captures the idea of computations that can flip coins (or read a bitstream or otherwise have some access to a sequence of binary choices).
data Coin = Heads | Tails
Any computation which can flip coins must either stop flipping and be a value (with B), or flip a coin and carry on (with C) in one way if the coin comes up Heads and another if Tails. The monadic operation which flips a coin and tells you what came up is
coin :: A Coin
coin = C (B Heads) (B Tails)
The >>= of A can now be seen as sequencing coin-flipping computations, allowing the choice of a subsequent computation to depend on the value delivered by an earlier computation.
If you have an infinite stream of coins, then (apart from your extraordinary good fortune) you're also lucky enough to be able to run any A-computation to its value, as follows
data Stream x = x :> Stream x -- actually, I mean "codata"
flipping :: Stream Coin -> A v -> v
flipping _ (B v) = v
flipping (Heads :> cs) (C h t) = flipping cs h
flipping (Tails :> cs) (C h t) = flipping cs t
The general pattern in this sort of monad is to have one constructor for returning a value (B here) and a bunch of others which represent the choice of possible operations and the different ways computations can continue given the result of an operation. Here C has no non-recursive parameters and two subtrees, so I could tell that there must be just one operation and that it must have just two possible outcomes, hence flipping a coin.
So, it's substitution for a syntax with variables and one binary operator, or it's a way of sequencing computations that flip coins. Which view is better? Well... they're two sides of the same coin.
A good rule of thumb for return is to make it the simplest possible thing which could work (of course, any definition that satisfies the monad laws is fine, but usually you want something with minimal structure). In this case it's as simple as return = B (now write a (>>=) to match!).
By the way, this is an example of a free monad -- in fact, it's the example given in the documentation, so I'll let the documentation speak for itself.

Resources