Haskell: Filtering based on an index vector, using only basic higher-order functions

Haskell: Filtering based on an index vector, using only basic higher-order functions - haskell

The problem
I have a vector a of size N holding sample data, and another vector b of size M (N>M) holding indices. I would like to obtain a vector c of size N containing the filtered elements from a based on the indices in b.
The question
Is it possible to implement the desired function without using list comprehension, just basic higher-order functions like map, zipWith, filter, etc. (more precisely, their equivalents mapV, zipWithV, filterV, etc.)
Prerequisites:
I am using a Haskell Embedded Domain Specific Language (ForSyDe, module ForSyDe.Shallow.Vector), limited to a set of hardware synthesize-able functions. In order to respect the design methodology, I am allowed to use only the provided functions (thus I cannot use list comprehensions, etc.)

Disclaimer:
I did not test this code for functionality because cabal started bugging around. It worked well for lists and as I transformed every vector to a list, it should work fine although problems may arise.
Try this:
indexFilter :: (Num b, Eq b, Enum b) => Vector a -> Vector b -> Vector a
indexFilter vector indices = vector (map fst (filter (\x -> (snd x) `elem` (fromVector indices)) vectorMap))
where
vectorMap = zip (fromVector vector) [0..]
indexFilter takes a list of tuple of the form (<element>, <index>) and then returns a vector of all elements which index is in the vector b. vectorMap is a just a zip of the elements of a and their indices in the vector.

Although the answer provided by ThreeFx is a correct answer to the question, it did not solve my problem due to several constraints enforced by the design methodology (ForSyDe), which were not mentioned:
lists cannot be used (they cannot be synthesized to other backends). ForSyDe provides two data containers: Signal (for temporal span) and Vector (for spatial span). This should ensure analyzability for system synthesis.
elem does not have a ForSyDe.Shallow.Vector implementation
Solution 1
Using only what the library provides, the shortest solution I found is:
indexFilter1 :: (Num b, Eq b, Enum b) => Vector a
-> Vector b
-> Vector (Vector a)
indexFilter1 v = mapV (\idx -> selectV idx 1 1 v)
The output vector can further be unwrapped, depending on the further usage.
Solution 2
Translating ThreeFx's solution to satisfy the constraints mentioned:
indexFilter :: (Num b, Eq b, Enum b) => Vector a
-> Vector b
-> Vector a
indexFilter v idx = mapV (fst) (filterV (\x -> elemV (snd x) idx) vectorMap)
where
vectorMap = zipWithV (\a b -> (b, a)) (iterateV size (+1) 0) v
size = lengthV v
elemV a = foldlV (\acc x -> if x == a then True else acc) False

Related

Flattening tuples in Haskell

In Haskell we can flatten a list of lists Flatten a list of lists
For simple cases of tuples, I can see how we would flatten certain tuples, as in the following examples:
flatten :: (a, (b, c)) -> (a, b, c)
flatten x = (fst x, fst(snd x), snd(snd x))
flatten2 :: ((a, b), c) -> (a, b, c)
flatten2 x = (fst(fst x), snd(fst x), snd x)
However, I'm after a function that accepts as input any nested tuple and which flattens that tuple.
Can such a function be created in Haskell?
If one cannot be created, why is this the case?

No, it's not really possible. There are two hurdles to clear.
The first is that all the different sizes of tuples are different type constructors. (,) and (,,) are not really related to each other at all, except in that they happen to be spelled with a similar sequence of characters. Since there are infinitely many such constructors in Haskell, having a function which did something interesting for all of them would require a typeclass with infinitely many instances. Whoops!
The second is that there are some very natural expectations we naively have about such a function, and these expectations conflict with each other. Suppose we managed to create such a function, named flatten. Any one of the following chunks of code seems very natural at first glance, if taken in isolation:
flattenA :: ((Int, Bool), Char) -> (Int, Bool, Char)
flattenA = flatten
flattenB :: ((a, b), c) -> (a, b, c)
flattenB = flatten
flattenC :: ((Int, Bool), (Char, String)) -> (Int, Bool, Char, String)
flattenC = flatten
But taken together, they seem a bit problematic: flattenB = flatten can't possibly be type-correct if both flattenA and flattenC are! Both of the input types for flattenA and flattenC unify with the input type to flattenB -- they are both pairs whose first component is itself a pair -- but flattenA and flattenC return outputs with differing numbers of components. In short, the core problem is that when we write (a, b), we don't yet know whether a or b is itself a tuple and should be "recursively" flattened.
With sufficient effort, it is possible to do enough type-level programming to put together something that sometimes works on limited-size tuples. But it is 1. a lot of up-front effort, 2. very little long-term programming efficiency payoff, and 3. even at use sites requires a fair amount of boilerplate. That's a bad combo; if there's use-site boilerplate, then you might as well just write the function you cared about in the first place, since it's generally so short to do so anyway.

Efficient way to do n-ary branch / tabulated functions?

I'm trying to get some basic information on the performance characteristics of branches in SBV.
Let's suppose I have an SInt16 and a very sparse lookup table Map Int16 a. I can implement the lookup with nested ite:
sCase :: (Mergeable a) => SInt16 -> a -> Map Int16 a -> a
sCase x def = go . toList
where
go [] = def
go ((k,v):kvs) = ite (x .== literal k) v (go kvs)
However, this means the generated tree will be very deep.
Does that matter?
If yes, is it better to instead generate a balanced tree of branches, effectively mirroring the Map's structure? Or is there some other scheme that would give even better performance?
If there are less than 256 entries in the map, would it change anything to "compress" it so that sCase works on an SInt8 and a Map Int8 a?
Is there some built-in SBV combinator for this use case that works better than iterated ite?
EDIT: It turns out that it matters a lot what my a is, so let me add some more detail to that. I am currently using sCase to branch in a stateful computation modeled as an RWS r w s a, with the following instances:
instance forall a. Mergeable a => Mergeable (Identity a) where
symbolicMerge force cond thn els = Identity $ symbolicMerge force cond (runIdentity thn) (runIdentity els)
instance (Mergeable s, Mergeable w, Mergeable a, forall a. Mergeable a => Mergeable (m a)) => Mergeable (RWST r w s m a) where
symbolicMerge force cond thn els = Lazy.RWST $
symbolicMerge force cond (runRWST thn) (runRWST els)
So stripping away all the newtypes, I'd like to branch into something of type r -> s -> (a, s, w) s.t. Mergeable s, Mergeable w and Mergeable a.

Symbolic look-ups are expensive
Symbolic array lookup will be expensive regardless of what data-structure you use. It boils down to the fact that there's no information available to the symbolic execution engine to cut-down on the state-space, so it ends up doing more or less what you coded yourself.
SMTLib Arrays
However, the best solution in these cases is to actually use SMT's support for arrays: http://smtlib.cs.uiowa.edu/theories-ArraysEx.shtml
SMTLib arrays are different than what you'd consider as an array in a regular programming language: It does not have bounds. In that sense, it's more of a map from inputs to outputs, spanning the entire domain. (i.e., they are equivalent to functions.) But SMT has custom theories to deal with arrays and thus they can handle problems involving arrays much more efficiently. (On the down-side, there's no notion of index-out-of-bounds or somehow controlling the range of elements you can access. You can code those up yourself on top of the abstraction though, leaving it up to you to decide how you want to handle such invalid accesses.)
If you are interested in learning more about how SMT solvers deal with arrays, the classic reference is: http://theory.stanford.edu/~arbrad/papers/arrays.pdf
Arrays in SBV
SBV supports arrays, through the SymArray class: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV.html#t:SymArray
The SFunArray type actually does not use SMTLib arrays. This was designed to support solvers that didn't understand Arrays, such as ABC: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV.html#t:SFunArray
The SArray type fully supports SMTLib arrays: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV.html#t:SArray
There are some differences between these types, and the above links describe them. However, for most purposes, you can use them interchangeably.
Converting a Haskell map to an SBV array
Going back to your original question, I'd be tempted to use an SArray to model such a look up. I'd code it as:
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
import qualified Data.Map as M
import Data.Int
-- Fill an SBV array from a map
mapToSArray :: (SymArray array, SymVal a, SymVal b) => M.Map a (SBV b) -> array a b -> array a b
mapToSArray m a = foldl (\arr (k, v) -> writeArray arr (literal k) v) a (M.toList m)
And use it as:
g :: Symbolic SBool
g = do let def = 0
-- get a symbolic array, initialized with def
arr <- newArray "myArray" (Just def)
let m :: M.Map Int16 SInt16
m = M.fromList [(5, 2), (10, 5)]
-- Fill the array from the map
let arr' :: SArray Int16 Int16 = mapToSArray m arr
-- A simple problem:
idx1 <- free "idx1"
idx2 <- free "idx2"
pure $ 2 * readArray arr' idx1 + 1 .== readArray arr' idx2
When I run this, I get:
*Main> sat g
Satisfiable. Model:
idx1 = 5 :: Int16
idx2 = 10 :: Int16
You can run it as satWith z3{verbose=True} g to see the SMTLib output it generates, which avoids costly lookups by simply delegating those tasks to the backend solver.
Efficiency
The question of whether this will be "efficient" really depends on how many elements your map has that you're constructing the array from. The larger the number of elements and the trickier the constraints, the less efficient it will be. In particular, if you ever write to an index that is symbolic, I'd expect slow-downs in solving time. If they're all constants, it should be relatively performant. As is usual in symbolic programming, it's really hard to predict any performance without seeing the actual problem and experimenting with it.
Arrays in the query context
The function newArray works in the symbolic context. If you're in a query context, instead use freshArray: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV-Control.html#v:freshArray

Extract (a, b, c) from (Value a, Value b, Value c)

I'm using esqueleto for making SQL queries, and I have one query which returns data with type (Value a, Value b, Value c). I want to extract (a, b, c) from it. I know that I can use pattern matching like that:
let (Value a, Value b, Value c) = queryResult
But I'd like to avoid repeating Value for every tuple element. This is particularly annoying when the tuple has much more elements (like 10). Is there any way to simplify this? Is there a function which I could use like that:
let (a, b, c) = someFunction queryResult

Data.Coerce from base provides coerce, which acts as your someFunction.
coerce "exchanges" newtypes for the underlying type they wrap (and visa-versa). This works even if they are wrapped deeply within other types. This is also done with zero overhead, since newtypes have the exact same runtime representation as the type they wrap.
There is a little bit more complexity with type variable roles that you can read about on the Wiki page if you're interested, but an application like this turns out to be straightforward since the package uses the "default" role for Value's type variable argument.

The library appears to have an unValue function, so you just need to choose a way to map over arbitrary length tuples. Then someFunction can become
import Control.Lens (over, each)
someFunction = (over each) unValue
If you want to try some other ways to map tuples without a lens dependency, you could check out this question: Haskell: how to map a tuple?
edit: As danidiaz points out this only works for tuples which are max 8 fields long. I'm not sure if there's a better way to generalise it.

If your tuple has all the same element type:
all3 :: (a -> b) -> (a, a, a) -> (b, b, b)
all3 f (x, y, z) = (f x, f y, f z)
This case can be abstracted over with lenses, using over each as described in #Zpalmtree’s answer.
But if your tuple has different element types, you can make the f argument of this function polymorphic using the RankNTypes extension:
all3 :: (forall a. c a -> a) -> (c x, c y, c z) -> (x, y, z)
all3 f (x, y, z) = (f x, f y, f z)
Then assuming you have unValue :: Value a -> a, you can write:
(a, b, c) = all3 unValue queryResult
However, you would need to write separate functions all4, all5, …, all10 if you have large tuples. In that case you could cut down on the boilerplate by generating them with Template Haskell. This is part of the reason that large tuples are generally avoided in Haskell, since they’re awkward to work with and can’t be easily abstracted over.

Apply function to all pairs efficiently

I need a second order function pairApply that applies a binary function f to all unique pairs of a list-like structure and then combines them somehow. An example / sketch:
pairApply (+) f [a, b, c] = f a b + f a c + f b c
Some research leads me to believe that Data.Vector.Unboxed probably will have good performance (I will also need fast access to specific elements); also it necessary for Statistics.Sample, which would come in handy further down the line.
With this in mind I have the following, which almost compiles:
import qualified Data.Vector.Unboxed as U      
pairElement :: (U.Unbox a, U.Unbox b)    
=> (U.Vector a)                    
  -> (a -> a -> b)                   
  -> Int                             
-> a                               
 -> (U.Vector b)                    
pairElement v f idx el =
U.map (f el) $ U.drop (idx + 1) v            
pairUp :: (U.Unbox a, U.Unbox b)   
=> (a -> a -> b)                        
 -> (U.Vector a)                         
-> (U.Vector (U.Vector b))
pairUp f v = U.imap (pairElement v f) v 
pairApply :: (U.Unbox a, U.Unbox b)
=> (b -> b -> b)                     
-> b                                 
 -> (a -> a -> b)                     
-> (U.Vector a)                      
 -> b
pairApply combine neutral f v =
folder $ U.map folder (pairUp f v) where
folder = U.foldl combine neutral
The reason this doesn't compile is that there is no Unboxed instance of a U.Vector (U.Vector a)). I have been able to create new unboxed instances in other cases using Data.Vector.Unboxed.Deriving, but I'm not sure it would be so easy in this case (transform it to a tuple pair where the first element is all the inner vectors concatenated and the second is the length of the vectors, to know how to unpack?)
My question can be stated in two parts:
Does the above implementation make sense at all or is there some quick library function magic etc that could do it much easier?
If so, is there a better way to make an unboxed vector of vectors than the one sketched above?
Note that I'm aware that foldl is probably not the best choice; once I've got the implementation sorted I plan to benchmark with a few different folds.

There is no way to define a classical instance for Unbox (U.Vector b), because that would require preallocating a memory area in which each element (i.e. each subvector!) has the same fixed amount of space. But in general, each of them may be arbitrarily big, so that's not feasible at all.
It might in principle be possible to define that instance by storing only a flattened form of the nested vector plus an extra array of indices (where each subvector starts). I once briefly gave this a try; it actually seems somewhat promising as far as immutable vectors are concerned, but a G.Vector instance also requires a mutable implementation, and that's hopeless for such an approach (because any mutation that changes the number of elements in one subvector would require shifting everything behind it).
Usually, it's just not worth it, because if the individual element vectors aren't very small the overhead of boxing them won't matter, i.e. often it makes sense to use B.Vector (U.Vector b).
For your application however, I would not do that at all – there's no need to ever wrap the upper element-choices in a single triangular array. (And it would be really bad for performance to do that, because it make the algorithm take O (n²) memory rather than O (n) which is all that's needed.)
I would just do the following:
pairApply combine neutral f v
= U.ifoldl' (\acc i p -> U.foldl' (\acc' q -> combine acc' $ f p q)
acc
(U.drop (i+1) v) )
neutral v
This corresponds pretty much to the obvious nested-loops imperative implementation
pairApply(combine, b, f, v):
for(i in 0..length(v)-1):
for(j in i+1..length(v)-1):
b = combine(b, f(v[i], v[j]);
return b;

My answer is basically the same as leftaroundabout's nested-loops imperative implementation:
pairApply :: (Int -> Int -> Int) -> Vector Int -> Int
pairApply f v = foldl' (+) 0 [f (v ! i) (v ! j) | i <- [0..(n-1)], j <- [(i+1)..(n-1)]]
where n = length v
As far as I know, I do not see any performance issue with this implementation.
Non-polymorphic for simplicity.

Are there useful applications for the Divisible Type Class?

I've lately been working on an API in Elm where one of the main types is contravariant. So, I've googled around to see what one can do with contravariant types and found that the Contravariant package in Haskell defines the Divisible type class.
It is defined as follows:
class Contravariant f => Divisible f where
divide :: (a -> (b, c)) -> f b -> f c -> f a
conquer :: f a
It turns out that my particular type does suit the definition of the Divisible type class. While Elm does not support type classes, I do look at Haskell from time to time for some inspiration.
My question: Are there any practical uses for this type class? Are there known APIs out there in Haskell (or other languages) that benefit from this divide-conquer pattern? Are there any gotchas I should be aware of?
Thank you very much for your help.

One example:
Applicative is useful for parsing, because you can turn Applicative parsers of parts into a parser of wholes, needing only a pure function for combining the parts into a whole.
Divisible is useful for serializing (should we call this coparsing now?), because you can turn Divisible serializers of parts into a serializer of wholes, needing only a pure function for splitting the whole into parts.
I haven't actually seen a project that worked this way, but I'm (slowly) working on an Avro implementation for Haskell that does.
When I first came across Divisible I wanted it for divide, and had no idea what possible use conquer could be other than cheating (an f a out of nowhere, for any a?). But to make the Divisible laws check out for my serializers conquer became a "serializer" that encodes anything to zero bytes, which makes a lot of sense.

Here's a possible use case.
In streaming libraries, one can have fold-like constructs like the ones from the foldl package, that are fed a sequence of inputs and return a summary value when the sequence is exhausted.
These folds are contravariant on their inputs, and can be made Divisible. This means that if you have a stream of elements where each element can be somehow decomposed into b and c parts, and you also happen to have a fold that consumes bs and another fold that consumes cs, then you can build a fold that consumes the original stream.
The actual folds from foldl don't implement Divisible, but they could, using a newtype wrapper. In my process-streaming package I have a fold-like type that does implement Divisible.
divide requires the return values of the constituent folds to be of the same type, and that type must be an instance of Monoid. If the folds return different, unrelated monoids, a workaround is to put each return value in a separate field of a tuple, leaving the other field as mempty. This works because a tuple of monoids is itself a Monoid.

I'll examine the example of the core data types in Fritz Henglein's generalized radix sort techniques as implemented by Edward Kmett in the discrimination package.
While there's a great deal going on there, it largely focuses around a type like this
data Group a = Group (forall b . [(a, b)] -> [[b]])
If you have a value of type Group a you essentially must have an equivalence relationship on a because if I give you an association between as and some type b completely unknown to you then you can give me "groupings" of b.
groupId :: Group a -> [a] -> [[a]]
groupId (Group grouper) = grouper . map (\a -> (a, a))
You can see this as a core type for writing a utility library of groupings. For instance, we might want to know that if we can Group a and Group b then we can Group (a, b) (more on this in a second). Henglein's core idea is that if you can start with some basic Groups on integers—we can write very fast Group Int32 implementations via radix sort—and then use combinators to extend them over all types then you will have generalized radix sort to algebraic data types.
So how might we build our combinator library?
Well, f :: Group a -> Group b -> Group (a, b) is pretty important in that it lets us make groups of product-like types. Normally, we'd get this from Applicative and liftA2 but Group, you'll notice, is Contravaiant, not a Functor.
So instead we use Divisible
divided :: Group a -> Group b -> Group (a, b)
Notice that this arises in a strange way from
divide :: (a -> (b, c)) -> Group b -> Group c -> Group a
as it has the typical "reversed arrow" character of contravariant things. We can now understand things like divide and conquer in terms of their interpretation on Group.
Divide says that if I want to build a strategy for equating as using strategies for equating bs and cs, I can do the following for any type x
Take your partial relation [(a, x)] and map over it with a function f :: a -> (b, c), and a little tuple manipulation, to get a new relation [(b, (c, x))].
Use my Group b to discriminate [(b, (c, x))] into [[(c, x)]]
Use my Group c to discriminate each [(c, x)] into [[x]] giving me [[[x]]]
Flatten the inner layers to get [[x]] like we need
instance Divisible Group where
conquer = Group $ return . fmap snd
divide k (Group l) (Group r) = Group $ \xs ->
-- a bit more cleverly done here...
l [ (b, (c, d)) | (a,d) <- xs, let (b, c) = k a] >>= r
We also get interpretations of the more tricky Decidable refinement of Divisible
class Divisible f => Decidable f where
lose :: (a -> Void) -> f a
choose :: (a -> Either b c) -> f b -> f c -> f a
instance Decidable Group where
lose :: (a -> Void) -> Group a
choose :: (a -> Either b c) -> Group b -> Group c -> Group a
These read as saying that for any type a of which we can guarantee there are no values (we cannot produce values of Void by any means, a function a -> Void is a means of producing Void given a, thus we must not be able to produce values of a by any means either!) then we immediately get a grouping of zero values
lose _ = Group (\_ -> [])
We also can go a similar game as to divide above except instead of sequencing our use of the input discriminators, we alternate.
Using these techniques we build up a library of "Groupable" things, namely Grouping
class Grouping a where
grouping :: Group a
and note that nearly all the definitions arise from the basic definition atop groupingNat which uses fast monadic vector manipuations to achieve an efficient radix sort.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Haskell: Filtering based on an index vector, using only basic higher-order functions - haskell

Related

Flattening tuples in Haskell

Efficient way to do n-ary branch / tabulated functions?

Extract (a, b, c) from (Value a, Value b, Value c)

Apply function to all pairs efficiently

Are there useful applications for the Divisible Type Class?

Categories

Resources