Why are difference lists not an instance of foldable?

Why are difference lists not an instance of foldable? - haskell

The dlist package contains the DList data type, which has lots of instances, but not Foldable or Traversable. In my mind, these are two of the most "list-like" type classes. Is there a performance reason that DList is not an instance of these classes?
Also, the package does implement foldr and unfoldr, but none of the other folding functions.

One alternative you should consider instead of DList is to use Church-encoded lists. The idea is that you represent a list as an opaque value that knows how to execute a foldr over a list. This requires using the RankNTypes extension:
{-# LANGUAGE RankNTypes #-}
import Prelude
import Control.Applicative
import Data.Foldable (Foldable)
import qualified Data.Foldable as F
import Data.Traversable (Traversable)
import qualified Data.Traversable as T
-- | Laws:
--
-- > runList xs cons nil == xs
-- > runList (fromList xs) f z == foldr f z xs
-- > foldr f z (toList xs) == runList xs f z
newtype ChurchList a =
ChurchList { runList :: forall r. (a -> r -> r) -> r -> r }
-- | Make a 'ChurchList' out of a regular list.
fromList :: [a] -> ChurchList a
fromList xs = ChurchList $ \k z -> foldr k z xs
-- | Turn a 'ChurchList' into a regular list.
toList :: ChurchList a -> [a]
toList xs = runList xs (:) []
-- | We can construct an empty 'ChurchList' without using a #[]#.
nil :: ChurchList a
nil = ChurchList $ \_ z -> z
-- | The 'ChurchList' counterpart to '(:)'. Unlike 'DList', whose
-- implementation uses the regular list type, 'ChurchList' doesn't
-- rely on it at all.
cons :: a -> ChurchList a -> ChurchList a
cons x xs = ChurchList $ \k z -> k x (runList xs k z)
-- | Append two 'ChurchList's. This runs in O(1) time. Note that
-- there is no need to materialize the lists as #[a]#.
append :: ChurchList a -> ChurchList a -> ChurchList a
append xs ys = ChurchList $ \k z -> runList xs k (runList ys k z)
-- | Map over a 'ChurchList'. No need to materialize the list.
instance Functor ChurchList where
fmap f xs = ChurchList $ \k z -> runList xs (\x xs' -> k (f x) xs') z
-- | The 'Foldable' instance is trivial, given the 'ChurchList' law.
instance Foldable ChurchList where
foldr f z xs = runList xs f z
instance Traversable ChurchList where
traverse f xs = runList xs step (pure nil)
where step x rest = cons <$> f x <*> rest
The downside to this is that there is no efficient tail operation for a ChurchList—folding a ChurchList is cheap, but taking repeated tails is costly...

DList a is a newtype wrapper around [a] -> [a], which has an a in a contravariant position, so it cannot implement Foldable or Traversable, or even Functor directly. The only way to implement them is to convert to and from regular lists (see the foldr implementation), which defeats the performance advantage of difference lists.

Related

Why is foldr' not as strict as foldl'?

Consider these various attempts at something that works like last:
Prelude> import Data.Foldable
Prelude Data.Foldable> foldr const undefined (reverse [1,2,3])
3
Prelude Data.Foldable> foldr' const undefined (reverse [1,2,3])
3
Prelude Data.Foldable> foldl (flip const) undefined [1,2,3]
3
Prelude Data.Foldable> foldl' (flip const) undefined [1,2,3]
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
undefined, called at <interactive>:5:21 in interactive:Ghci4
It makes sense to me that foldl and foldr both work, since they aren't strict in their accumulator, and it makes sense to me that foldl' doesn't, since it is. But why does foldr' work? Isn't it supposed to be strict in its accumulator too?

For reference, the instance Foldable [] overrides foldr, foldl, foldl', but not foldr' (source):
instance Foldable [] where
elem = List.elem
foldl = List.foldl
foldl' = List.foldl'
foldl1 = List.foldl1
foldr = List.foldr
{- ... -}
foldr' is defined by default as (source):
foldr' :: (a -> b -> b) -> b -> t a -> b
foldr' f z0 xs = foldl f' id xs z0
where f' k x z = k $! f x z
Note that there is only a strictness annotation on the result of f. So the initial accumulator is not forced.
This suggests a different implementation which does force the accumulator:
foldr'' :: Foldable t => (a -> b -> b) -> b -> t a -> b
foldr'' f = foldr (\x z -> f x $! z)
(Edited: the previous version was specialized to lists.)
I have no idea why one was chosen over the other. Probably an oversight,
and it would be more consistent for foldr' to not use the default implementation in the Foldable [] instance.
As an aside, the default definition of foldl' is also different from the list one in the same way:
-- Default (class Foldable t where ...)
foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
where f' x k z = k $! f z x
-- List implementation
foldl' :: forall a b . (b -> a -> b) -> b -> [a] -> b
foldl' k z0 xs =
foldr (\(v::a) (fn::b->b) -> oneShot (\(z::b) -> z `seq` fn (k z v))) (id :: b -> b) xs z0

Removing duplicate elements in a Seq

wondering how to implement nub over a Seq a
I get that one could do:
nubSeq :: Seq a -> Seq a
nubSeq = fromList . nub . toList
Just wondering is there something standard that does not convert to Lists in order to call nub :: [a]->[a]?
An implementation that occurred to me, based obviously on nub, is:
nubSeq :: (Eq a) => Seq a -> Seq a
nubSeq = Data.Sequence.foldrWithIndex
(\_ x a -> case x `Data.Sequence.elemIndexR` a of
Just _ -> a
Nothing -> a |> x) Data.Sequence.empty
But there must be something more elegant?
thanks.

Not sure whether this qualifies as more elegant but it splits the concerns in independent functions (caveat: you need an Ord constraint on a):
seqToNubMap takes a Seq and outputs a Map associating to each a the smallest index at which it appeared in the sequence
mapToList takes a Map of values and positions and produces a list of values in increasing order according to the specified positions
nubSeq combines these to generate a sequence without duplicates
The whole thing should be O(n*log(n)), I believe:
module NubSeq where
import Data.Map as Map
import Data.List as List
import Data.Sequence as Seq
import Data.Function
seqToNubMap :: Ord a => Seq a -> Map a Int
seqToNubMap = foldlWithIndex (\ m k v -> insertWith min v k m) Map.empty
mapToList :: Ord a => Map a Int -> [a]
mapToList = fmap fst . List.sortBy (compare `on` snd) . Map.toList
nubSeq :: Ord a => Seq a -> Seq a
nubSeq = Seq.fromList . mapToList . seqToNubMap
Or a simpler alternative following #DavidFletcher's comment:
nubSeq' :: forall a. Ord a => Seq a -> Seq a
nubSeq' xs = Fold.foldr cons nil xs Set.empty where
cons :: a -> (Set a -> Seq a) -> (Set a -> Seq a)
cons x xs seen
| x `elem` seen = xs seen
| otherwise = x <| xs (Set.insert x seen)
nil :: Set a -> Seq a
nil _ = Seq.empty

Another way with an Ord constraint - use a scan to make the sets of
elements that appear in each prefix of the list. Then we can filter out
any element that's already been seen.
import Data.Sequence as Seq
import Data.Set as Set
nubSeq :: Ord a => Seq a -> Seq a
nubSeq xs = (fmap fst . Seq.filter (uncurry notElem)) (Seq.zip xs seens)
where
seens = Seq.scanl (flip Set.insert) Set.empty xs
Or roughly the same thing as a mapAccumL:
nubSeq' :: Ord a => Seq a -> Seq a
nubSeq' = fmap fst . Seq.filter snd . snd . mapAccumL f Set.empty
where
f s x = (Set.insert x s, (x, x `notElem` s))
(If I was using lists I would use Maybes instead of the pairs with
Bool, then use catMaybes instead of filtering. There doesn't seem to be catMaybes
for Sequence though.)

I think your code should be pretty efficient. Since Sequences are tree data structures using another tree type data structure like Map or HashMap to store and lookup the previous items doesn't make too much sense to me.
Instead i take the first item and check it's existence in the rest. If exists i drop that item and proceed the same with the rest recursively. If not then construct a new sequence with first element is the unique element and the rest is the result of nubSeq fed by the rest. Should be typical. I use ViewPatterns.
{-# LANGUAGE ViewPatterns #-}
import Data.Sequence as Seq
nubSeq :: Eq a => Seq a -> Seq a
nubSeq (viewl -> EmptyL) = empty
nubSeq (viewl -> (x :< xs)) | elemIndexL x xs == Nothing = x <| nubSeq xs
| otherwise = nubSeq xs
*Main> nubSeq . fromList $ [1,2,3,4,4,2,3,6,7,1,2,3,4]
fromList [6,7,1,2,3,4]

Recursion scheme in Haskell for repeatedly breaking datatypes into "head" and "tail" and yielding a structure of results

In Haskell, I recently found the following function useful:
listCase :: (a -> [a] -> b) -> [a] -> [b]
listCase f [] = []
listCase f (x:xs) = f x xs : listCase f xs
I used it to generate sliding windows of size 3 from a list, like this:
*Main> listCase (\_ -> take 3) [1..5]
[[2,3,4],[3,4,5],[4,5],[5],[]]
Is there a more general recursion scheme which captures this pattern? More specifically, that allows you to generate a some structure of results by repeatedly breaking data into a "head" and "tail"?

What you are asking for is a comonad. This may sound scarier than monad, but is a simpler concept (YMMV).
Comonads are Functors with additional structure:
class Functor w => Comonad w where
extract :: w a -> a
duplicate :: w a -> w (w a)
extend :: (w a -> b) -> w a -> w b
(extendand duplicate can be defined in terms of each other)
and laws similar to the monad laws:
duplicate . extract = id
duplicate . fmap extract = id
duplicate . duplicate = fmap duplicate . duplicate
Specifically, the signature (a -> [a] -> b) takes non-empty Lists of type a. The usual type [a] is not an instance of a comonad, but the non-empty lists are:
data NE a = T a | a :. NE a deriving Functor
instance Comonad NE where
extract (T x) = x
extract (x :. _) = x
duplicate z#(T _) = T z
duplicate z#(_ :. xs) = z :. duplicate xs
The comonad laws allow only this instance for non-empty lists (actually a second one).
Your function then becomes
extend (take 3 . drop 1 . toList)
Where toList :: NE a -> [a] is obvious.
This is worse than the original, but extend can be written as =>> which is simpler if applied repeatedly.
For further information, you may start at What is the Comonad typeclass in Haskell?.

This looks like a special case of a (jargon here but it can help with googling) paramorphism, a generalisation of primitive recursion to all initial algebras.
Reimplementing ListCase
Let's have a look at how to reimplement your function using such a combinator. First we define the notion of paramorphism: a recursion principle where not only the result of the recursive call is available but also the entire substructure this call was performed on:
The type of paraList tells me that in the (:) case, I will have access to the head, the tail and the value of the recursive call on the tail and that I need to provide a value for the base case.
module ListCase where
paraList :: (a -> [a] -> b -> b) -- cons
-> b -- nil
-> [a] -> b -- resulting function on lists
paraList c n [] = n
paraList c n (x : xs) = c x xs $ paraList c n xs
We can now give an alternative definition of listCase:
listCase' :: (a -> [a] -> b) -> [a] -> [b]
listCase' c = paraList (\ x xs tl -> c x xs : tl) []
Considering the general case
In the general case, we are interested in building a definition of paramorphism for all data structures defined as the fixpoint of a (strictly positive) functor. We use the traditional fixpoint operator:
newtype Fix f = Fix { unFix :: f (Fix f) }
This builds an inductive structure layer by layer. The layers have an f shape which maybe better grasped by recalling the definition of List using this formalism. A layer is either Nothing (we're done!) or Just (head, tail):
newtype ListF a as = ListF { unListF :: Maybe (a, as) }
type List a = Fix (ListF a)
nil :: List a
nil = Fix $ ListF $ Nothing
cons :: a -> List a -> List a
cons = curry $ Fix . ListF .Just
Now that we have this general framework, we can define para generically for all Fix f where f is a functor:
para :: Functor f => (f (Fix f, b) -> b) -> Fix f -> b
para alg = alg . fmap (\ rec -> (rec, para alg rec)) . unFix
Of course, ListF a is a functor. Meaning we could use para to reimplement paraList and listCase.
instance Functor (ListF a) where fmap f = ListF . fmap (fmap f) . unListF
paraList' :: (a -> List a -> b -> b) -> b -> List a -> b
paraList' c n = para $ maybe n (\ (a, (as, b)) -> c a as b) . unListF
listCase'' :: (a -> List a -> b) -> List a -> List b
listCase'' c = paraList' (\ x xs tl -> cons (c x xs) tl) nil
You can implement a simple bijection toList, fromList to test it if you want. I could not be bothered to reimplement take so it's pretty ugly:
toList :: [a] -> List a
toList = foldr cons nil
fromList :: List a -> [a]
fromList = paraList' (\ x _ tl -> x : tl) []
*ListCase> fmap fromList . fromList . listCase'' (\ _ as -> toList $ take 3 $ fromList as). toList $ [1..5]
[[2,3,4],[3,4,5],[4,5],[5],[]]

Restriction on the data type definition

I have a type synonym type Entity = ([Feature], Body) for whatever Feature and Body mean. Objects of Entity type are to be grouped together:
type Bunch = [Entity]
and the assumption, crucial for the algorithm working with Bunch, is that any two entities in the same bunch have the equal number of features.
If I were to implement this constraint in an OOP language, I would add the corresponding check to the method encapsulating the addition of entities into a bunch.
Is there a better way to do it in Haskell? Preferably, on the definition level. (If the definition of Entity also needs to be changed, no problem.)

Using type-level length annotations
So here's the deal. Haskell does have type-level natural numbers and you can annotate with types using "phantom types". However you do it, the types will look like this:
data Z
data S n
data LAList x len = LAList [x] -- length-annotated list
Then you can add some construction functions for convenience:
lalist1 :: x -> LAList x (S Z)
lalist1 x = LAList [x]
lalist2 :: x -> x -> LAList x (S (S Z))
lalist2 x y = LAList [x, y]
-- ...
And then you've got more generic methods:
(~:) :: x -> LAList x n -> LAList x (S n)
x ~: LAList xs = LAList (x : xs)
infixr 5 ~:
nil :: LAList x Z
nil = LAList []
lahead :: LAList x (S n) -> x
lahead (LAList xs) = head xs
latail :: LAList x (S n) -> LAList x n
latail (LAList xs) = tail xs
but by itself the List definition doesn't have any of this because it's complicated. You may be interested in the Data.FixedList package for a somewhat different approach, too. Basically every approach is going to start off looking a little weird with some data type that has no constructor, but it starts to look normal after a little bit.
You might also be able to get a typeclass so that all of the lalist1, lalist2 operators above can be replaced with
class FixedLength t where
la :: t x -> LAList x n
but you will probably need the -XTypeSynonymInstances flag to do this, as you want to do something like
type Pair x = (x, x)
instance FixedLength Pair where
la :: Pair x -> LAList [x] (S (S Z))
la (a, b) = LAList [a, b]
(it's a kind mismatch when you go from (a, b) to Pair a).
Using runtime checking
You can very easily take a different approach and encapsulate all of this as a runtime error or explicitly model the error in your code:
-- this may change if you change your definition of the Bunch type
features :: Entity -> [Feature]
features = fst
-- we also assume a runBunch :: [Entity] -> Something function
-- that you're trying to run on this Bunch.
allTheSame :: (Eq x) => [x] -> Bool
allTheSame (x : xs) = all (x ==) xs
allTheSame [] = True
permissiveBunch :: [Entity] -> Maybe Something
permissiveBunch es
| allTheSame (map (length . features) es) = Just (runBunch es)
| otherwise = Nothing
strictBunch :: [Entity] -> Something
strictBunch es
| allTheSame (map (length . features) es) = runBunch es
| otherwise = error ("runBunch requires all feature lists to be the same length; saw instead " ++ show (map (length . features) es))
Then your runBunch can just assume that all the lengths are the same and it's explicitly checked for above. You can get around pattern-matching weirdnesses with, say, the zip :: [a] -> [b] -> [(a, b)] function in the Prelude, if you need to pair up the features next to each other. (The goal here would be an error in an algorithm due to pattern-matching for both runBunch' (x:xs) (y:ys) and runBunch' [] [] but then Haskell warns that there are 2 patterns which you've not considered in the match.)
Using tuples and type classes
One final way to do it which is a compromise between the two (but makes for pretty good Haskell code) involves making Entity parametrized over all features:
type Entity x = (x, Body)
and then including a function which can zip different entities of different lengths together:
class ZippableFeatures z where
fzip :: z -> z -> [(Feature, Feature)]
instance ZippableFeatures () where
fzip () () = []
instance ZippableFeatures Feature where
fzip f1 f2 = [(f1, f2)]
instance ZippableFeatures (Feature, Feature) where
fzip (a1, a2) (b1, b2) = [(a1, b1), (a2, b2)]
Then you can use tuples for your feature lists, as long as they don't get any larger than the maximum tuple length (which is 15 on my GHC). If you go larger than that, of course, you can always define your own data types, but it's not going to be as general as type-annotated lists.
If you do this, your type signature for runBunch will simply look like:
runBunch :: (ZippableFeatures z) => [Entity z] -> Something
When you run it on things with the wrong number of features you'll get compiler errors that it can't unify the type (a, b) with (a, b, c).

There are various ways to enforce length constraints like that; here's one:
{-# LANGUAGE DataKinds, KindSignatures, GADTs, TypeFamilies #-}
import Prelude hiding (foldr)
import Data.Foldable
import Data.Monoid
import Data.Traversable
import Control.Applicative
data Feature -- Whatever that really is
data Body -- Whatever that really is
data Nat = Z | S Nat -- Natural numbers
type family Plus (m::Nat) (n::Nat) where -- Type level natural number addition
Plus Z n = n
Plus (S m) n = S (Plus m n)
data LList (n :: Nat) a where -- Lists tagged with their length at the type level
Nil :: LList Z a
Cons :: a -> LList n a -> LList (S n) a
Some functions on these lists:
llHead :: LList (S n) a -> a
llHead (Cons x _) = x
llTail :: LList (S n) a -> LList n a
llTail (Cons _ xs) = xs
llAppend :: LList m a -> LList n a -> LList (Plus m n) a
llAppend Nil ys = ys
llAppend (Cons x xs) ys = Cons x (llAppend xs ys)
data Entity n = Entity (LList n Feature) Body
data Bunch where
Bunch :: [Entity n] -> Bunch
Some instances:
instance Functor (LList n) where
fmap f Nil = Nil
fmap f (Cons x xs) = Cons (f x) (fmap f xs)
instance Foldable (LList n) where
foldMap f Nil = mempty
foldMap f (Cons x xs) = f x `mappend` foldMap f xs
instance Traversable (LList n) where
traverse f Nil = pure Nil
traverse f (Cons x xs) = Cons <$> f x <*> traverse f xs
And so on. Note that n in the definition of Bunch is existential. It can be anything, and what it actually is doesn't affect the type—all bunches have the same type. This limits what you can do with bunches to a certain extent. Alternatively, you can tag the bunch with the length of its feature lists. It all depends what you need to do with this stuff in the end.

Map with multiple arguments an Trees

i have a problem with a map function and recursion.
I have a Tree data structure like this:
data Tree a = T a [Tree a] deriving (Eq,Ord,Show)
I already have a working function to count "through" the Tree which works.
count :: Tree a -> Int
count (T _ xs) = 1 + sum(map count xs)
No i want a function to proof every element of the tree with a predicate
filterKnoten :: (a -> Bool) -> Tree a -> [a]
filterKnoten p (T x []) = (if p(x) then [x] else [])
filterKnoten p (T x xs)
| p(x) == True = x:(map multP xs)
| p(x) == False = map multP xs
where multP = filterKnoten p
Some sample date would be
ex1 = T True [T False [T True[]] , T True []]
No when i call the method with for example
filterKnoten (==True) ex1
As a result I want to have list with all elements which fits my predicate, but the compile gives me this when i want to load the module
Couldn't match type `a' with `[a]'
`a' is a rigid type variable bound by
the type signature for filterKnoten :: (a -> Bool) -> Tree a -> [a]
at WS11.hs:138:17
Expected type: Tree a -> a
Actual type: Tree a -> [a]
In the first argument of `map', namely `multP'
In the expression: map multP xs
In an equation for `filterKnoten':
filterKnoten p (T x xs)
| p (x) == True = x : (map multP xs)
| p (x) == False = map multP xs
where
multP = filterKnoten p
Failed, modules loaded: none.
So my question, why does map works with count and not with filterKnoten?
Thanks in advance

I imagine that writing the function using recursion is a useful exercise, but from a practical standpoint, you can just derive all of these functions with GHC extensions:
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Data.Foldable
import Data.Traversable
data Tree a = T a [Tree a] deriving (Eq,Ord,Show, Functor, Traversable, Foldable)
filterTree :: (a -> Bool) -> Tree a -> [a]
filterTree f = filter f . toList -- toList is defined in Foldable
You don't need Traversable for this example, it is just one of the other useful things you can derive.

Your major problem is that you're mapping a function of type T a -> [a] which gives you back [[a]] instead of the [a] you want. This can be fixed by changing map to concatMap from Data.List
import Data.List
...
filterKnoten :: (a -> Bool) -> Tree a -> [a]
filterKnoten p (T x []) = if p x then [x] else []
filterKnoten p (T x xs)
| p x = x:(concatMap multP xs)
| otherwise = concatMap multP xs
where multP = filterKnoten p
Notice that I've also gotten rid of the useless ==True and ==False's. x == True is precisely the same as x and when we're talking about to cases there's no point in computing the same thing twice. Especially since it's potentially very expensive. otherwise is just a synonym for True that prelude provides. I've also gotten rid of some of the unnecessary parens.
Finally, you can just dump the entire first case since map and concatMap work on empty lists.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why are difference lists not an instance of foldable? - haskell

Related

Why is foldr' not as strict as foldl'?

Removing duplicate elements in a Seq

Recursion scheme in Haskell for repeatedly breaking datatypes into "head" and "tail" and yielding a structure of results

Restriction on the data type definition

Map with multiple arguments an Trees

Categories

Resources