Standard way to zip two lists using custom comparator functions

Standard way to zip two lists using custom comparator functions - haskell

If often need to zip two lists, discarding non matching elements (where "matching" is definied by comparing parts of the elements of the lists). For example:
let as = [(1,"a"), (2,"b"), (4,"c")]
let bs = [(2,"a"), (3,"b"), (4,"c"), (5, "d")]
zipWithAdjust (fst) (fst) as bs
-- ~> [((2,"b"),(2,"a")), ((4,"c"),(4,"c"))]
I implemented zipWithAdjust as follows:
zipWithAdjust :: (Ord c, Show a, Show b, Show c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]
zipWithAdjust cmpValA cmpValB (a:as) (b:bs)
| cmpValA a == cmpValB b = (a,b) : zipWithAdjust cmpValA cmpValB as bs
| cmpValA a > cmpValB b = zipWithAdjust cmpValA cmpValB (a:as) bs
| cmpValA a < cmpValB b = zipWithAdjust cmpValA cmpValB as (b:bs)
zipWithAdjust _ _ _ _ = []
It works fine, but I have a feeling that there is a standard way to do such zip. I found Data.Align and this SO Question but can't figure out how to use it for my use case.
Is there a standard way to do this (using library functions)? Is it Data.Align? If so, how do I implement the above function using Data.Align?
Edit: Changed the < case to get implementation matching with example.

As far as I know, there's no such function. However, you could make your function more general by using (a -> b -> Ordering) instead of two additional functions:
zipWithAdjust :: (a -> b -> Ordering) -> [a] -> [b] -> [(a,b)]
zipWithAdjust cmp (a:as) (b:bs)
| ord == LT = zipWithAdjust cmp as (b:bs)
| ord == GT = zipWithAdjust cmp (a:as) (bs)
| ord == EQ = (a,b) : zipWithAdjust cmp as bs
where ord = cmp a b
zipWithAdjust _ _ _ = []
result = zipWithAdjust (\x y -> compare (fst x) (fst y)) [(1,"a"), (2,"b"), (4,"c")] [(2,"a"), (3,"b"), (4,"c"), (5, "d")]
However, I wouldn't call this zip anymore, but something like compareMerge or similar.

You might like Data.Map's intersection capabilities. It's slightly less capable in some ways and more capable in others. For example:
> let as = fromList [(1,"a"), (2,"b"), (4,"c")]; bs = fromList [(2,"a"), (3,"b"), (4,"c"), (5, "d")]
> intersectionWith (,) as bs
fromList [(2,("b","a")),(4,("c","c"))]

I would say it's a bit more idiomatic to say
zipWithAdjust cmpA cmpB (a:as) (b:bs) =
case cmpA a `compare` cmpB b of
EQ -> (a, b) : zipWithAdjust cmpA cmpB as bs
GT -> zipWithAdjust cmpA cmpB (a:as) bs
LT -> zipWithAdjust cmpA cmpB as (b:bs)
zipWithAdjust _ _ _ _ = []
It would certainly be faster, since it reduces the number of times you have to calculate cmpA a and cmpB b. This isn't truly a zip since you are filtering at the same time, and also offsetting in your GT and LT cases. I would say that this solution is perfectly fine as it is, there isn't a need to implement it using standard functions.

edit: using These a b type from Data.These (used by Data.Align), with this:
ordzipBy :: (Ord t) => (a -> t) -> (b -> t) -> [a] -> [b] -> [These a b]
ordzipBy f g a#(x:t) b#(y:r) = case compare (f x) (g y) of
LT -> This x : ordzipBy f g t b
GT -> That y : ordzipBy f g a r
EQ -> These x y : ordzipBy f g t r
ordzipBy _ _ a [] = map This a
ordzipBy _ _ [] b = map That b
we can express three set operations as:
diffBy :: (Ord t) => (a -> t) -> (b -> t) -> [a] -> [b] -> [a]
meetBy :: (Ord t) => (a -> t) -> (b -> t) -> [a] -> [b] -> [(a, b)]
joinBy :: (Ord t) => (a -> t) -> (a->a->a) -> [a] -> [a] -> [a]
diffBy f g xs ys = [x | This x <- ordzipBy f g xs ys]
meetBy f g xs ys = [(x,y) | These x y <- ordzipBy f g xs ys]
joinBy f h xs ys = mergeThese h `map` ordzipBy f f xs ys
what you describe is meetBy, i.e. set intersection operation, with the two ordered lists seen as sets.
The ability of a compiler to efficiently compile these definitions is another question though. The three set functions hand-coded along the lines of ordzipBy might run faster.
ordzipBy f g is compatible with align, and [] with nil, but the type machinery involved in making it happen is above my pay grade. :) Also, it's not clear to me whether the law align (f <$> xs) (g <$> ys) = bimap f g <$> align xs ys would make sense at all because mapping the functions f and g can very well change the mutual ordering of elements of xs and ys.
The two problems (the types, and the law) are related: the parts of data recovered by selector functions for ordering purposes serve as positions, as shape, yet are part of the original data. (cf. instance Alternative ZipList in Haskell?).
update: see if the following works as you expected.
{-# LANGUAGE InstanceSigs, DatatypeContexts #-}
import Data.These
import Data.Align
newtype Ord a => ZL a b = ZL {unzl :: [(a,b)]}
deriving (Eq, Show)
instance Ord a => Functor (ZL a) where
fmap f (ZL xs) = ZL [(k, f v) | (k,v)<-xs]
instance Ord a => Align (ZL a) where
nil = ZL []
align :: (ZL a b) -> (ZL a c) -> (ZL a (These b c))
align (ZL a) (ZL b) = ZL (g a b) where
g a#((k,x):t) b#((n,y):r) = case compare k n of
LT -> (k, This x ) : g t b
GT -> (n, That y) : g a r
EQ -> (k, These x y) : g t r
g a [] = [(k, This x) | (k,x) <- a]
g [] b = [(n, That y) | (n,y) <- b]
diffBy :: (Ord t) => (a -> t) -> (b -> t) -> [a] -> [b] -> [a]
meetBy :: (Ord t) => (a -> t) -> (b -> t) -> [a] -> [b] -> [(a, b)]
joinBy :: (Ord t) => (a -> t) -> (a->a->a) -> [a] -> [a] -> [a]
diffBy f g xs ys = catThis . map snd . unzl
$ align (ZL [(f x,x) | x<-xs]) (ZL [(g y,y) | y<-ys])
meetBy f g xs ys = catThese . map snd . unzl
$ align (ZL [(f x,x) | x<-xs]) (ZL [(g y,y) | y<-ys])
joinBy f h xs ys = map (mergeThese h . snd) . unzl
$ align (ZL [(f x,x) | x<-xs]) (ZL [(f y,y) | y<-ys])
Infinite lists aren't handled well though, while the hand-coded functions can obviously quite easily be made to handle such cases correctly:
*Main> diffBy id id [1..5] [4..9]
[1,2,3]
*Main> diffBy id id [1..5] [4..]
[1,2,3Interrupted.
*Main> meetBy id id [1,3..10] [2,5..20]
[(5,5)]
*Main> joinBy id const [1,3..10] [2,5..20]
[1,2,3,5,7,8,9,11,14,17,20]

Related

mapEither inserting both Left and Right

Using the function mapEither for multiset's I can turn a MultiSet into a pair of two multisets. When f is returning Left the element is inserted into the first Multiset of the pair, and if f is returning Right the element is inserted into the second MultiSet of the pair.
How can I insert the same element into both MultiSets at the same time, as if f were returning Right and Left at the same time?
f:: LocalType -> Either LocalType LocalType
f (Sometype lt) = Left lt -- And Right lt
f lt = Left lt
parRule :: (MultiSet LocalType) -> (MultiSet LocalType)
parRule sequent = do
let list = MultiSet.mapEither f sequent
For reference, I use Data.Multiset package, https://hackage.haskell.org/package/multiset-0.3.4.3/docs/Data-MultiSet.html.

You can use a type like These to capture the ability to return both. You can then use toAscOccurList and fromOccurList (or fromAscOccurList if your function is monotonic) to compute the new MultiSet.

You could use These as Daniel Wagner suggests, but I would use a slightly different function to start with, which seems like a slightly better match to the library API. Furthermore, I would recommend a different implementation strategy for performance.
data SP a b = SP !a !b
toPair :: SP a b -> (a, b)
toPair (SP a b) = (a, b)
mapPairOcc :: (Ord b, Ord c) => (a -> Occur -> ((b, Occur), (c, Occur))) -> MultiSet a -> (MultiSet b, MultiSet c)
mapPairOcc f = toPair . mapPairOcc' f
mapPairOcc' :: (Ord b, Ord c) => (a -> Occur -> ((b, Occur), (c, Occur))) -> MultiSet a -> SP (MultiSet b) (MultiSet c)
mapPairOcc' f = foldl' go (SP empty empty) . toAscOccurList
where
go (SP bs cs) a
| ((b, bn), (c, cn)) <- f a
= SP (insertMany b bn bs) (insertMany c cn cs)
When you know that f is strictly monotone in the sense that
a < a' ==> fst (f a) < fst (f a') /\ snd (f a) < snd (f a')
it's possible to do better, building the results in O(n) time. The best way to do this seems to be to use Data.Map internals. I'll reuse the SP type from above.
import Data.Map.Lazy (Map)
import Data.MultiSet (MultiSet, Occur)
import qualified Data.MultiSet as MS
import qualified Data.Map.Internal as M
import Control.Monad (guard)
-- | Map over the keys and values in a map, producing
-- two maps with new keys and values. The passed function
-- must be strictly monotone in the keys in the sense
-- described above.
mapMaybeWithKey2Mono :: (k -> a -> (Maybe (l,b), Maybe (m,c))) -> Map k a -> (Map l b, Map m c)
mapMaybeWithKey2Mono f = toPair . mapMaybeWithKey2Mono' f
mapMaybeWithKey2Mono' :: (k -> a -> (Maybe (l,b), Maybe (m,c))) -> Map k a -> SP (Map l b) (Map m c)
mapMaybeWithKey2Mono' _ M.Tip = SP M.Tip M.Tip
mapMaybeWithKey2Mono' f (M.Bin _ kx x l r)
| (fl, fr) <- f kx x
= SP (groink fl mfl1 mfr1) (groink fr mfl2 mfr2)
where
groink :: Maybe (q, x) -> Map q x -> Map q x -> Map q x
groink m n o = case m of
Just (k', y) -> M.link k' y n o
Nothing -> M.link2 n o
SP mfl1 mfl2 = mapMaybeWithKey2Mono' f l
SP mfr1 mfr2 = mapMaybeWithKey2Mono' f r
Using this new general Map function, we can define the function we want on multisets:
mapPairAscOcc :: (a -> Occur -> ((b, Occur), (c, Occur))) -> MultiSet a -> (MultiSet b, MultiSet c)
mapPairAscOcc f m
| (p, q) <- mapMaybeWithKey2Mono go . MS.toMap $ m
= (MS.fromOccurMap p, MS.fromOccurMap q)
where
-- a -> Occur -> (Maybe (b, Occur), Maybe (c, Occur))
go a aocc
| ((b, bocc), (c, cocc)) <- f a aocc
= ( (b, bocc) <$ guard (bocc > 0)
, (c, cocc) <$ guard (cocc > 0) )

I took the function mapEither from the Data.MultiSet and modified it such that it supports These type.
-- | /O(n)/. Map and separate the 'This' and 'That' or 'These' results
-- modified function of mapEither to map both cases in case f return These
-- code of mapEither found in source code,
mapThese :: (Ord b, Ord c) => (a -> These b c) -> MultiSet a -> (MultiSet b, MultiSet c)
mapThese f = (\(ls,rs) -> (MultiSet.fromOccurList ls, MultiSet.fromOccurList rs)) . mapThese' . MultiSet.toOccurList
where mapThese' [] = ([],[])
mapThese' ((x,n):xs) = case f x of
This l -> let (ls,rs) = mapThese' xs in ((l,n):ls, rs)
That r -> let (ls,rs) = mapThese' xs in (ls, (r,n):rs)
These u i -> let (ls,rs) = mapThese' xs in ((u,n):ls, (i,n):rs)
In the case f returns These, both MultiSet's have an added element.

How to update a structure with recursion schemes?

In recursion schemes, how can I construct something with type definition like (Recursive t, CoRecursive t) -> t -> ? -> t
I try to use recursion-schemes to update nodes. Taking list as an example, I can come up with two methods like:
update :: [a] -> Natural -> a -> [a]
update = para palg where
palg Nil _ _ = []
palg (Cons a (u, _)) 0 b = b : u
palg (Cons a (u, f)) n b = a : f (n-1) b
update' :: [a] -> Natural -> a -> [a]
update' = c2 (apo acoalg) where
c2 f a b c = f (a,b,c)
acoalg ([], _, _) = Nil
acoalg (_:as , 0, b) = Cons b $ Left as
acoalg (a:as , n, b) = Cons a $ Right (as, n-1, b)
However, these two implementations are good. In these two implementations, the constructor of ListF and [] appears in both sides of the equation. And the definition does not appear to be unique. Is there a better way to perform List update with recursion schemes?

Recursion schemes is flexible approach. You can also implement your own variant.
(Reuse cata)
zipo :: (Recursive g, Recursive h) => (Base g (h -> c) -> Base h h -> c) -> g -> h -> c
zipo alg = cata zalg
where
zalg x = alg x <<< project
update :: forall a. [a] -> Natural -> a -> [a]
update xs n a = zipo alg n xs
where
alg :: Maybe ([a] -> [a]) -> ListF a [a] -> [a]
alg _ Nil = []
alg Nothing (Cons y ys) = a:ys
alg (Just n') (Cons y ys) = y:(n' ys)
Also u can implement some parallel version like
zipCata :: (Recursive g, Recursive h) => ((g -> h -> r) -> Base g g -> Base h h -> r) -> g -> h -> r
zipCata phi x y = phi (zipCata phi) (project x) (project y)
update' :: forall a. [a] -> Natural -> a -> [a]
update' xs n a = zipCata alg n xs
where
alg :: (Natural -> [a] -> [a]) -> Maybe Natural -> ListF a [a] -> [a]
alg _ _ Nil = []
alg _ Nothing (Cons _ ys) = a:ys
alg f (Just n) (Cons y ys) = y:(f n ys)
Both variants (also as your) will be get the same result
PS. I hate approach for code sample on SO

Histomorphism a la Mendler

Using a histomorphism (histo) from recursion-schemes I can get the a list containing only the odd indexes from an initial list:
import Data.Functor.Foldable
odds :: [a] -> [a]
odds = histo $ \case
Nil -> []
Cons h (_ :< Nil) -> [h]
Cons h (_ :< Cons _ (t :< _)) -> h:t
How can get the same thing using mhisto?
nil = Fix Nil
cons a b = Fix $ Cons a b
list = cons 1 $ cons 2 $ cons 3 $ nil
modds :: Fix (ListF a) -> [a]
modds = mhisto alg where
alg _ _ Nil = []
alg f g (Cons a b) = ?

This is it:
modds :: Fix (ListF a) -> [a]
modds = mhisto alg
where
alg _ _ Nil = []
alg odd pre (Cons a b) = a : case pre b of
Nil -> []
Cons _ b' -> odd b'
GHCi> list = cata embed [1..10] :: Fix (ListF Int)
GHCi> odds (cata embed list)
[1,3,5,7,9]
GHCi> modds list
[1,3,5,7,9]
odd folds the rest of the list, while pre digs the predecessor. Note how the availability of an y -> f y function in the Mendler algebra mirrors the introduction of Cofree in the ordinary histomorphism algebra (in which digging back can be done by reaching for the tail of the Cofree stream):
cata :: Functor f => (f c -> c) -> Fix f -> c
histo :: Functor f => (f (Cofree f c) -> c) -> Fix f -> c
mcata :: (forall y. (y -> c) -> f y -> c) -> Fix f -> c
mhisto :: (forall y. (y -> c) -> (y -> f y) -> f y -> c) -> Fix f -> c
For further reading on mcata and mhisto, see chapters 5 and 6 of Categorical programming with inductive and coinductive types, by Varmo Vene.

How does the expression `ap zip tail` work

I wondered how to write f x = zip x (tail x) in point free. So I used the pointfree program and the result was f = ap zip tail. ap being a function from Control.Monad
I do not understand how the point free definition works. I hope I can figure it out if I can comprehend it from the perspective of types.
import Control.Monad (ap)
let f = ap zip tail
let g = ap zip
:info ap zip tail f g
ap :: Monad m => m (a -> b) -> m a -> m b
-- Defined in `Control.Monad'
zip :: [a] -> [b] -> [(a, b)] -- Defined in `GHC.List'
tail :: [a] -> [a] -- Defined in `GHC.List'
f :: [b] -> [(b, b)] -- Defined at <interactive>:3:5
g :: ([a] -> [b]) -> [a] -> [(a, b)]
-- Defined at <interactive>:4:5
By looking at the expression ap zip tail I would think that zip is the first parameter of ap and tail is the second parameter of ap.
Monad m => m (a -> b) -> m a -> m b
\--------/ \---/
zip tail
But this is not possible, because the types of zip and tail are completely different than what the function ap requires. Even with taking into consideration that the list is a monad of sorts.

So the type signature of ap is Monad m => m (a -> b) -> m a -> m b. You've given it zip and tail as arguments, so let's look at their type signatures.
Starting with tail :: [a] -> [a] ~ (->) [a] [a] (here ~ is the equality operator for types), if we compare this type against the type of the second argument for ap,
(->) [x] [x] ~ m a
((->) [x]) [x] ~ m a
we get a ~ [x] and m ~ ((->) [x]) ~ ((->) a). Already we can see that the monad we're in is (->) [x], not []. If we substitute what we can into the type signature of ap we get:
(((->) [x]) ([x] -> b)) -> (((->) [x]) [x]) -> (((->) [x]) b)
Since this is not very readable, it can more normally be written as
([x] -> ([x] -> b)) -> ([x] -> [x]) -> ([x] -> b)
~ ([x] -> [x] -> b ) -> ([x] -> [x]) -> ([x] -> b)
The type of zip is [x] -> [y] -> [(x, y)]. We can already see that this lines up with the first argument to ap where
[x] ~ [x]
[y] ~ [x]
[(x, y)] ~ b
Here I've listed the types vertically so that you can easily see which types line up. So obviously x ~ x, y ~ x, and [(x, y)] ~ [(x, x)] ~ b, so we can finish substituting b ~ [(x, x)] into ap's type signature and get
([x] -> [x] -> [(x, x)]) -> ([x] -> [x]) -> ([x] -> [(x, x)])
-- zip tail ( ap zip tail )
-- ap zip tail u = zip u (tail u)
I hope that clears things up for you.
EDIT: As danvari pointed out in the comments, the monad (->) a is sometimes called the reader monad.

There are two aspects to understanding this:
The type magic
The information flow of the implementation
Firstly, this helped me understand the type magic:
1) zip : [a] → ( [a] → [(a,a)] )
2) tail : [a] → [a]
3) zip <*> tail : [a] → [(a,a)]
4) <*> : Applicative f ⇒ f (p → q) → f p → f q
In this case, for <*>,
5) f x = y → x
Note that in 5, f is a type constructor. Applying f to x produces a type. Also, here = is overloaded to mean equivalence of types.
y is currently a place-holder, in this case, it is [a], which means
6) f x = [a] -> x
Using 6, we can rewrite 1,2 and 3 as follows:
7) zip : f ([a] → [(a,a)])
8) tail : f [a]
9) zip <*> tail : f ([a] → [(a,a)]) → f [a] → f [(a,a)]
So, looking at 4, we are substituting as follows:
10) p = [a]
11) q = [(a,a)]
12) f x = [a] → x
(Repetition of 6 here again as 12 )
Secondly, the information flow, i.e. the actual functionality. This is easier, it is clear from the definition of <*> for the Applicative instance of y →, which is rewritten here with different identifier names and using infix style:
13) g <*> h $ xs = g xs (h xs)
Substituting as follows:
14) g = zip
15) h = tail
Gives:
zip <*> tail $ xs (Using 14 and 15)
==
zip xs (tail xs) (Using 13 )

relation between monadic filter and fold

Many higher-order functions can be defined in term of the fold function. For example, here is the relation between filter and foldl in Haskell.
myFilter p [] = []
myFilter p l = foldl (\y x -> if (p x) then (x:y) else y) [] (reverse l)
Is there a similar relation between their monadic versions filterM and foldM ? How can I write filterM in term of foldM ?
I tried hard to find a monadic equivalent to \y x -> if (p x) then (x:y) else y to plug into foldM without success.

Like in D.M.'s answer, only without the reverse. Let the types guide you:
import Control.Monad
{-
foldM :: (Monad m) => (b -> a -> m b) -> b -> [a] -> m b
filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
-}
filtM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filtM p xs = foldM f id xs >>= (return . ($ []))
where
f acc x = do t <- p x
if t then return (acc.(x:)) else return acc

Not sure that it has any sense (since it has that strange reverse), but at least it type checked well:
myFilterM :: Monad m => (a -> m Bool) -> [a] -> m [a]
myFilterM p l = foldM f [] (reverse l)
where
f y x = do
p1 <- p x
return $ if p1 then (x:y) else y

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Standard way to zip two lists using custom comparator functions - haskell

Related

mapEither inserting both Left and Right

How to update a structure with recursion schemes?

Histomorphism a la Mendler

How does the expression `ap zip tail` work

relation between monadic filter and fold

Categories

Resources