Using a different ordering on lists - haskell

In Haskell, the default ordering for [a], given an ordering on a, seems to be a lexicographic ordering (side question: where can I find out if this is really the case)? What I want is a graded lexicographic ordering (also called "length plus lexicographic" ordering).
How would I specify that I want comparisons to be done in a graded lexicographical manner? I want it for only one type, not for all [a]. I tried this:
instance Ord [Int] where
compare xs ys = case compare (length xs) (length ys) of
LT -> LT
GT -> GT
EQ -> lexicographic_compare xs ys
but got this error message:
> [1 of 1] Compiling Main ( test.hs, interpreted )
test.hs:1:10:
Illegal instance declaration for `Ord [Int]'
(All instance types must be of the form (T a1 ... an)
where a1 ... an are *distinct type variables*,
and each type variable appears at most once in the instance head.
Use -XFlexibleInstances if you want to disable this.)
In the instance declaration for `Ord [Int]'
Failed, modules loaded: none.
Thanks for any and all help!

This is a typical application for a newtype wrapper:
newtype GradedLexOrd a = GradedLexOrd { runGradedLexOrd :: [a] }
instance (Ord a) => Ord (GradedLexOrd a) where
compare (GradedLexOrd xs) (GradedLexOrd ys) = gradedLexOrd xs ys
gradedLexOrd :: Ord a => [a] -> [a] -> Ordering
gradedLexOrd = comparing length <> compare -- Nice Monoid-based implementation,
--due to Aaron Roth (see answer below)
Alternatively, you could openly use lists, but instead of the Ord constrained functions like sort use the more general alternatives which accept a custom comparison function, e.g. sortBy gradedLexOrd.

There are two questions here:
How does Ord [a] looks like?
Of course you can experiment within GHCi, but maybe you want something more reliable. This is surprisingly difficult, especially as the definition of Lists is (due to their special syntax) built into the compiler. Let’s ask GHCi:
Prelude> :info []
data [] a = [] | a : [a] -- Defined in `GHC.Types'
instance Eq a => Eq [a] -- Defined in `GHC.Classes'
instance Monad [] -- Defined in `GHC.Base'
instance Functor [] -- Defined in `GHC.Base'
instance Ord a => Ord [a] -- Defined in `GHC.Classes'
instance Read a => Read [a] -- Defined in `GHC.Read'
instance Show a => Show [a] -- Defined in `GHC.Show'
It says that the instance is defined in GHC.Classes, which we find in GHC’s git repo, and there it says:
instance (Ord a) => Ord [a] where
{-# SPECIALISE instance Ord [Char] #-}
compare [] [] = EQ
compare [] (_:_) = LT
compare (_:_) [] = GT
compare (x:xs) (y:ys) = case compare x y of
EQ -> compare xs ys
other -> other
So yes, it is indeed the lexicographic ordering.
How to overwrite the ordering?
Don’t. There is an instance for [a] and there can be only one. With FlexibleInstances and OverlappingInstances, you could make it use an alternative instance for, say, [Int], but it is bad style. As leftaroundabout writes, use a NewtypeWrapper for it, or use parametrized functions like sortBy.

Creating a whole new Ord instance for lists of Ints seems a bit heavyweight to my taste (not to mention that you may be sowing confusion: someone who comes along to your code later will probably expect the default, non-graded lexicographic comparison behavior).
If you're merely hoping not to have to copy your custom comparison code every time you use sortBy or the like, there's actually a fairly lightweight way of defining chained comparison functions like yours on the spot. Ordering, as it happens, is an instance of Monoid, which means you can compare two things according to a succession of criteria, then combine the resulting Orderings of those comparisons using the Monoid function, mappend (recently abbreviated to <>). This is all explained in some detail in the Learn You a Haskell chapter on Monoids, etc., which is where I picked up the trick. So:
import Data.Monoid ((<>))
import Data.Ord (comparing)
gradedLexicographicCompare :: (Ord a) => [a] -> [a] -> Ordering
gradedLexicographicCompare xs ys = comparing length xs ys <> comparing id xs ys
(Of course, comparing id is just compare, but for the sake of uniformity...) Then it becomes relatively unburdensome to write things like
f = ... sortBy s ...
where
...
s xs ys = comparing length xs ys <> compare xs ys
...
And this also has the virtue that your successor will see immediately that you're using a custom comparison function.
Update: leftaroundabout points out below that we can achieve even greater elegance -- this is Haskell after all, and in Haskell we can always achieve greater elegance -- by making use of the monoid instance, instance Monoid b => Monoid (a -> b). That is, a function whose result is a monoid can itself be considered a monoid. The instance is given by
instance Monoid b => Monoid (a -> b) where
mempty _ = mempty
mappend f g x = f x `mappend` g x (1)
Now let's indulge in a little equational reasoning and see what comparing length <> compare expands to according to this instance. Applying (1) once, we have
comparing length <> compare
= mappend (comparing length) compare
= \xs -> mappend ((comparing length) xs) (compare xs) (2)
But ((comparing length) xs) :: [a] -> Ordering and (compare xs) :: (Ord a) => a -> Ordering are themselves functions whose results are monoids, namely Orderings, so we can apply (1) a second time to obtain
mappend ((comparing length) xs) (compare xs)
= \ys -> mappend (((comparing length) xs) ys) ((compare xs) ys) (3)
But now (((comparing length) xs) ys) and ((compare xs) ys) are fully applied functions. Specifically, they are Orderings, and from the original answer we know how to combine two Orderings using mappend from the Ordering instance of Monoid. (Note that we are not using mappend from (1).) Writing everything down in one big chain, we have
comparing length <> compare
= mappend (comparing length) compare [definition of <>]
= \xs -> mappend ((comparing length) xs) (compare xs) [by (1)]
= \xs -> (\ys -> mappend (((comparing length) xs) ys) ((compare xs) ys)) [substituting (3) in (2)]
= \xs -> \ys -> mappend (comparing length xs ys) (compare xs ys) [function application is left associative]
= \xs -> \ys -> comparing length xs ys <> compare xs ys [definition of <>]
And the last line of this expansion is just our original gradedLexicographicCompare! After a long, long digression, then, the punchline is that we can write the elegantly points-free
gradedLexicographicCompare = comparing length <> compare
Pretty.

Related

Changing recursive guards into higher order functions

I'm trying to convert basic functions into higher order functions (specifically map, filter, or foldr). I was wondering if there are any simple concepts to apply where I could see old functions I've written using guards and turn them into higher order.
I'm working on changing a function called filterFirst that removes the first element from the list (second argument) that does not satisfy a given predicate function (first argument).
filterFirst :: (a -> Bool) -> [a] -> [a]
filterFirst _ [] = []
filterFirst x (y:ys)
| x y = y : filterFirst x ys
| otherwise = ys
For an example:
greaterOne :: Num a=>Ord a=>a->Bool
greaterOne x = x > 1
filterFirst greaterOne [5,-6,-7,9,10]
[5,-7,9,10]
Based on the basic recursion, I was wondering if there might be a way to translate this (and similar functions) to higher order map, filter, or foldr. I'm not very advanced and these functions are new to me.
There is a higher-order function that's appropriate here, but it's not in the base library. What's the trouble with foldr? If you just fold over the list, you'll end up rebuilding the whole thing, including the part after the deletion.
A more appropriate function for the job is para from the recursion-schemes package (I've renamed one of the type variables):
para :: Recursive t => (Base t (t, r) -> r) -> t -> r
In the case of lists, this specializes to
para :: (ListF a ([a], r) -> r) -> [a] -> r
where
data ListF a b = Nil | Cons a b
deriving (Functor, ....)
This is pretty similar to foldr. The recursion-schemes equivalent of foldr is
cata :: Recursive t => (Base t r -> r) -> t -> r
Which specializes to
cata :: (ListF a r -> r) -> [a] -> r
Take a break here and figure out why the type of cata is basically equivalent to that of foldr.
The difference between cata and para is that para passes the folding function not only the result of folding over the tail of the list, but also the tail of the list itself. That gives us an easy and efficient way to produce the rest of the list after we've found the first non-matching element:
filterFirst :: (a -> Bool) -> [a] -> [a]
filterFirst f = para go
where
--go :: ListF a ([a], [a]) -> [a]
go (Cons a (tl, r))
| f a = a : r
| otherwise = tl
go Nil = []
para is a bit awkward for lists, since it's designed to fit into a more general context. But just as cata and foldr are basically equivalent, we could write a slightly less awkward function specifically for lists.
foldrWithTails
:: (a -> [a] -> b -> b)
-> b -> [a] -> b
foldrWithTails f n = go
where
go (a : as) = f a as (go as)
go [] = n
Then
filterFirst :: (a -> Bool) -> [a] -> [a]
filterFirst f = foldrWithTails go []
where
go a tl r
| f a = a : r
| otherwise = tl
First, let's flip the argument order of your function. This will make a few steps easier, and we can flip it back when we're done. (I'll call the flipped version filterFirst'.)
filterFirst' :: [a] -> (a -> Bool) -> [a]
filterFirst' [] _ = []
filterFirst' (y:ys) x
| x y = y : filterFirst' ys x
| otherwise = ys
Note that filterFirst' ys (const True) = ys for all ys. Let's substitute that in place:
filterFirst' :: [a] -> (a -> Bool) -> [a]
filterFirst' [] _ = []
filterFirst' (y:ys) x
| x y = y : filterFirst' ys x
| otherwise = filterFirst' ys (const True)
Use if-else instead of a guard:
filterFirst' :: [a] -> (a -> Bool) -> [a]
filterFirst' [] _ = []
filterFirst' (y:ys) x = if x y then y : filterFirst' ys x else filterFirst' ys (const True)
Move the second argument to a lambda:
filterFirst' :: [a] -> (a -> Bool) -> [a]
filterFirst' [] = \_ -> []
filterFirst' (y:ys) = \x -> if x y then y : filterFirst' ys x else filterFirst' ys (const True)
And now this is something we can turn into a foldr. The pattern we were going for is that filterFirst' (y:ys) can be expressed in terms of filterFirst' ys, without using ys otherwise, and we're now there.
filterFirst' :: Foldable t => t a -> (a -> Bool) -> [a]
filterFirst' = foldr (\y f -> \x -> if x y then y : f x else f (const True)) (\_ -> [])
Now we just need to neaten it up a bit:
filterFirst' :: Foldable t => t a -> (a -> Bool) -> [a]
filterFirst' = foldr go (const [])
where go y f x
| x y = y : f x
| otherwise = f (const True)
And flip the arguments back:
filterFirst :: Foldable t => (a -> Bool) -> t a -> [a]
filterFirst = flip $ foldr go (const [])
where go y f x
| x y = y : f x
| otherwise = f (const True)
And we're done. filterFirst implemented in terms of foldr.
Addendum: Although filter isn't strong enough to build this, filterM is when used with the State monad:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.State
filterFirst :: (a -> Bool) -> [a] -> [a]
filterFirst x ys = evalState (filterM go ys) False
where go y = do
alreadyDropped <- get
if alreadyDropped || x y then
return True
else do
put True
return False
If we really want, we can write filterFirst using foldr, since foldr is kind of "universal" -- it allows any list transformation we can perform using recursion. The main downside is that the resulting code is rather counter-intuitive. In my opinion, explicit recursion is far better in this case.
Anyway here's how it is done. This relies on what I consider to be an antipattern, namely "passing four arguments to foldr". I call this an antipattern since foldr is usually called with three arguments only, and the result is not a function taking a fourth argument.
filterFirst :: (a->Bool)->[a]->[a]
filterFirst p xs = foldr go (\_ -> []) xs True
where
go y ys True
| p y = y : ys True
| otherwise = ys False
go y ys False = y : ys False
Clear? Not very much. The trick here is to exploit foldr to build a function Bool -> [a] which returns the original list if called with False, and the filtered-first list if called with True. If we craft that function using
foldr go baseCase xs
the result is then obviously
foldr go baseCase xs True
Now, the base case must handle the empty list, and in such case we must return a function returning the empty list, whatever the boolean argument is. Hence, we arrive at
foldr go (\_ -> []) xs True
Now, we need to define go. This takes as arguments:
a list element y
the result of the "recursion" ys (a function Bool->[a] for the rest of the list)
and must return a function Bool->[a] for the larger list. So let's also consider
a boolean argument
and finally make go return a list. Well, if the boolean is False we must return the list unchanged, so
go y ys False = y : ys False
Note that ys False means "the tail unchanged", so we are really rebuilding the whole list unchanged.
If instead the boolean is true, we query the predicate as in p y. If that is false, we discard y, and return the list tail unchanged
go y ys True
| p y = -- TODO
| otherwise = ys False
If p y is true, we keep y and we return the list tail filtered.
go y ys True
| p y = y : ys True
| otherwise = ys False
As a final note, we cold have used a pair ([a], [a]) instead of a function Bool -> [a], but that approach does not generalize as well to more complex cases.
So, that's all. This technique is something nice to know, but I do not recommend it in real code which is meant to be understood by others.
Joseph and chi's answers already show how to derive a foldr implementation, so I'll try to aid intuition.
map is length-preserving, filterFirst is not, so trivially map must be unsuited for implementing filterFirst.
filter (and indeed map) are memoryless - the same predicate/function is applied to each element of the list, regardless of the result on other elements. In filterFirst, behaviour changes once we see the first non-satisfactory element and remove it, so filter (and map) are unsuited.
foldr is used to reduce a structure to a summary value. It's very general, and it might not be immediately obvious without experience what sorts of things this may cover. filterFirst is in fact such an operation, though. The intuition is something like, "can we build it in a single pass through the structure, building it up as we go(, with additional state stored as required)?". I fear Joseph's answer obfuscates a little, as foldr with 4 parameters, it may not be immediately obvious what's going on, so let's try it a little differently.
filterFirst p xs = snd $ foldr (\a (deleted,acc) -> if not deleted && not (p a) then (True,acc) else (deleted,a:acc) ) (False,[]) xs
Here's a first attempt. The "extra state" here is obviously the bool indicating whether or not we've deleted an element yet, and the list accumulates in the second element of the tuple. At the end we call snd to obtain just the list. This implementation has the problem, however, that we delete the rightmost element not satisfying the predicate, because foldr first combines the rightmost element with the neutral element, then the second-rightmost, and so on.
filterFirst p xs = snd $ foldl (\(deleted,acc) a -> if not deleted && not (p a) then (True,acc) else (deleted,a:acc) ) (False,[]) xs
Here, we try using foldl. This does delete the leftmost non-satisfactory element, but has the side-effect of reversing the list. We can stick a reverse at the front, and this would solve the problem, but is somewhat unsatisfactory due to the double-traversal.
Then, if you go back to foldr, having realized that (basically) if you want transform a list whilst preserving order that foldr is the correct variant, you play with it for a while and end up writing what Joseph suggested. I do however agree with chi that straightforward recursion is the best solution here.
Your function can also be expressed as an unfold, or, more specifically, as an apomorphism. Allow me to begin with a brief explanatory note, before the solution itself.
The apomorphism is the recursion scheme dual to the paramorphism (see dfeuer's answer for more about the latter). Apomorphisms are examples of unfolds, which generate a structure from a seed. For instance, Data.List offers unfoldr, a list unfold.
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
The function given to unfoldr takes a seed and either produces a list element and a new seed (if the maybe-value is a Just) or terminates the list generation (if it is Nothing). Unfolds are more generally expressed by the ana function from recursion-schemes ("ana" is short for "anamorphism").
ana :: Corecursive t => (a -> Base t a) -> a -> t
Specialised to lists, this becomes...
ana #[_] :: (b -> ListF a b) -> b -> [a]
... which is unfoldr in different clothing.
An apomorphism is an unfold in which the generation of the structure can be short-circuited at any point of the process, by producing, instead of a new seed, the rest of the structure in a fell swoop. In the case of lists, we have:
apo #[_] :: (b -> ListF a (Either [a] b)) -> b -> [a]
Either is used to trigger the short-circuit: with a Left result, the unfold short-circuits, while with a Right it proceeds normally.
The solution in terms of apo is fairly direct:
{-# LANGUAGE LambdaCase #-}
import Data.Functor.Foldable
filterFirst :: (a -> Bool) -> [a] -> [a]
filterFirst p = apo go
where
go = \case
[] -> Nil
a : as
| p a -> Cons a (Right as)
| otherwise -> case as of
[] -> Nil
b : bs -> Cons b (Left bs)
It is somewhat more awkward than dfeuer's para-based solution, because if we want to short-circuit without an empty list for a tail we are compelled to emit one extra element (the b in the short-circuiting case), and so we have to look one position ahead. This awkwardness would grow by orders of magnitude if, rather than filterFirst, we were to impĺement plain old filter with an unfold, as beautifully explained in List filter using an anamorphism.
This answer is inspired by a comment from luqui on a now-deleted question.
filterFirst can be implemented in a fairly direct way in terms of span:
filterFirst :: (a -> Bool) -> [a] -> [a]
filterFirst p = (\(yeas, rest) -> yeas ++ drop 1 rest) . span p
span :: (a -> Bool) -> [a] -> ([a], [a]) splits the list in two at the first element for which the condition doesn't hold. After span, we drop the first element of the second part of the list (with drop 1 rather than tail so that we don't have to add a special case for []), and reassemble the list with (++).
As an aside, there is a near-pointfree spelling of this implementation which I find too pretty not to mention:
filterFirst :: (a -> Bool) -> [a] -> [a]
filterFirst p = uncurry (++) . second (drop 1) . span p
While span is a higher order function, it would be perfectly understandable if you found this implementation disappointing in the context of your question. After all, span is not much more fundamental than filterFirst itself. Shouldn't we try going a little deeper, to see if we can capture the spirit of this solution while expressing it as a fold, or as some other recursion scheme?
I believe functions like filterFirst can be fine demonstrations of hylomorphisms. A hylomorphism is an unfold (see my other answer for more on that) that generates an intermediate data structure followed by a fold which turns this data structure into something else. Though it might look like that would require two passes to get a result (one through the input structure, and another through the intermediate one), if the hylomorphism implemented properly (as done in the hylo function of recursion-schemes) it can be done in a single pass, with the fold consuming pieces of the intermediate structure as they are generated by the unfold (so that we don't have to actually build it all only to tear it down).
Before we start, here is the boilerplate needed to run what follows:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TemplateHaskell #-}
import Data.Functor.Foldable
import Data.Functor.Foldable.TH
The strategy here is picking an intermediate data structure for the hylomorphism that expresses the essence of what we want to achieve. In this case, we will use this cute thing:
data BrokenList a = Broken [a] | Unbroken a (BrokenList a)
-- I won't actually use those instances here,
-- but they are nice to have if you want to play with the type.
deriving (Eq, Show, Functor, Foldable, Traversable)
makeBaseFunctor ''BrokenList
BrokenList is very much like a list (Broken and Unbroken mirror [] and (:), while the makeBaseFunctor incantation generates a BrokenListF base functor analogous to ListF, with BrokenF and UnbrokenF constructors), except that it has another list attached at its end (the Broken constructor). It expresses, in a quite literal way, the idea of a list being divided in two parts.
With BrokenList at hand, we can write the hylomorphism. coalgSpan is the operation used for the unfold, and algWeld, the one used for the fold.
filterFirst p = hylo algWeld coalgSpan
where
coalgSpan = \case
[] -> BrokenF []
x : xs
| p x -> UnbrokenF x xs
| otherwise -> BrokenF xs
algWeld = \case
UnbrokenF x yeas -> x : yeas
BrokenF rest -> rest
coalgSpan breaks the list upon hitting a x element such that p x doesn't hold. Not adding that element to the second part of the list (BrokenF xs rather than BrokenF (x : xs)) takes care of the filtering. As for algWeld, it is used to concatenate the two parts (it is very much like what we would use to implement (++) using cata).
(For a similar example of BrokenList in action, see the breakOn implementation in Note 5 of this older answer of mine. It suggests what it would take to implement span using this strategy.)
There are at least two good things about this hylo-based implementation. Firstly, it has good performance (casual testing suggests that, if compiled with optimisations, it is at least as good as, and possibly slightly faster than, the most efficient implementations in other answers here). Secondly, it reflects very closely your original, explicitly recursive implementation of filterFirst (or, at any rate, more closely than the fold-only and unfold-only implementations).

Converting a foldl into fold1

I am using the following fold to get the final monotonically decreasing sequence of a list.
foldl (\acc x -> if x<=(last acc) then acc ++ [x] else [x]) [(-1)] a
So [9,5,3,6,2,1] would return [6,2,1]
However, with foldl I needed to supply a start for the fold namely [(-1)]. I was trying to turn into to a foldl1 to be able to handle any range of integers as well as any Ord a like so:
foldl1 (\acc x -> if x<=(last acc) then acc ++ [x] else [x]) a
But I get there error:
cannot construct infinite type: a ~ [a]
in the second argument of (<=) namely last acc
I was under the impression that foldl1 was basically :
foldl (function) [head a] a
But I guess this isn't so? How would you go about making this fold generic for any Ord type?
I was under the impression that foldl1 was basically :
foldl (function) [head a] a
No, foldl1 is basically:
foldl function (head a) (tail a)
So the initial element is not a list of head a, but head a.
How would you go about making this fold generic for any Ord type?
Well a quick fix is:
foldl (\acc x -> if x<=(last acc) then acc ++ [x] else [x]) [head a] (tail a)
But there are still two problems:
in case a is an empty list, this function will error (while you probably want to return the empty list); and
the code is not terribly efficient since both last and (++) run in O(n).
The first problem can easily be addressed by using pattern matching to prevent that scenario. But for the latter you better would for instance use a reverse approach. Like for instance:
f :: Ord t => [t] -> [t]
f [] = [] -- case when the empty list is given
f a = reverse $ foldl (\acc#(ac:_) x -> if x <= ac then (x:acc) else [x]) [head a] (tail a)
Furthermore personally I am not a huge fan of if-then-else in functional programming, you can for instance define a helper function like:
f :: Ord t => [t] -> [t]
f [] = [] -- case when the empty list is given
f a = reverse $ foldl g [head a] (tail a)
where g acc#(ac:_) x | x <= ac = (x:acc)
| otherwise = [x]
Now reverse runs in O(n) but this is done only once. Furthermore the (:) construction runs in O(1) so all the actions in g run in O(1) (well given the comparison of course works efficient, etc.) making the algorithm itself O(n).
For your sample input it gives:
*Main> f [9,5,3,6,2,1]
[6,2,1]
The type of foldl1 is:
Foldable t => (a -> a -> a) -> t a -> a
Your function argument,
\acc x -> if x<=(last acc) then acc ++ [x] else [x]
has type:
(Ord a) => [a] -> a -> [a]
When Haskell's typechecker tries typechecking your function, it'll try unifying the type a -> a -> a (the type of the first argument of foldl1) with the type [a] -> a -> [a] (the type of your function).
To unify these types would require unifying a with [a], which would lead to the infinite type a ~ [a] ~ [[a]] ~ [[[a]]]... and so on.
The reason this works while using foldl is that the type of foldl is:
Foldable t => (b -> a -> b) -> b -> t a -> b
So [a] gets unified with b and a gets unified with the other a, leading to no problem at all.
foldl1 is limited in that it can only take functions which deal with only one type, or, in other terms, the accumulator needs to be the same type as the input list (for instance, when folding a list of Ints, foldl1 can only return an Int, while foldl can use arbitrary accumulators. So you can't do this using foldl1).
With regards to making this generic for all Ord values, one possible solution is to make a new typeclass for values which state their own "least-bound" value, which would then be used by your function. You can't make this function as it is generic on all Ord values because not all Ord values have sequence least bounds you can use.
class LowerBounded a where
lowerBound :: a
instance LowerBounded Int where
lowerBound = -1
finalDecreasingSequence :: (Ord a, LowerBounded a) => [a] -> [a]
finalDecreasingSequence = foldl buildSequence lowerBound
where buildSequence acc x
| x <= (last acc) = acc ++ [x]
| otherwise = [x]
You might also want to read a bit about how Haskell does its type inference, as it helps a lot in figuring out errors like the one you got.

Is Haskell's "variable" eager evaluated? [duplicate]

I am wondering why :sprint reports xs = _ in this case:
Prelude> xs = map (+1) [1..10]
Prelude> length xs
10
Prelude> :sprint xs
xs = _
but not in this case:
Prelude> xs = map (+1) [1..10] :: [Int]
Prelude> length xs
10
Prelude> :sprint xs
xs = [_,_,_,_,_,_,_,_,_,_]
Note: I am running ghci with -XNoMonomorphismRestriction. Does it have to do with the fact that the type of xs is polymorphic in the first case but not in the second? I'd like to know what's going on internally.
The gist is that the with the polymorphic xs it has a type of the form
xs :: Num a => [a]
typeclasses under the hood are really just functions, they take an extra argument that GHC automatically fills that contains a record of the typeclasses functions. So you can think of xs having the type
xs :: NumDict a -> [a]
So when you run
Prelude> length xs
It has to choose some value for a, and find the corresponding NumDict value. IIRC it'll fill it with Integer, so you're actually calling a function with and checking the length of the resulting list.
When you then :sprint xs, you once again fill in that argument, this time with a fresh type variable. But the point is that you're getting an entirely different list, you gave it a different NumDict so it's not forced in any way when you called length before.
This is very different then with the explicitly monomorphic list since there really is only one list there, there's only one value to force so when you call length, it forces it for all future uses of xs.
To make this a bit clearer, consider the code
data Smash a = Smash { smash :: a -> a -> a }
-- ^ Think of Monoids
intSmash :: Smash Int
intSmash = Smash (+)
listSmash :: Smash [a]
listPlus = Smash (++)
join :: Smash a -> [a] -> a
join (Smash s) xs = foldl1' s xs
This is really what type classes are like under the hood, GHC would automatically fill in that first Smash a argument for us. Now your first example is like join, we can't make any assumptions about what the output will be as we apply it to different types, but your second example is more like
join' :: [Int] -> Int
join' = join intSmash

How to parallelize a fold with Parallel Evaluation Strategies in Haskell [duplicate]

I have a function with type below:
union :: a -> a -> a
And a has additivity property. So we can regard union as a version of (+)
Say, we have [a], and want to perform a parallel "folding", for non-parallel foldling we can do only:
foldl1' union [a]
But how to perform it in parallel?
I can demonstrate problem on Num values and (+) function.
For example, we have a list [1,2,3,4,5,6] and (+)
In parallel we should split
[1,2,3] (+) [4,5,6]
[1,2] (+) [3] (+) [4,5] (+) [6]
([1] (+) [2]) (+) ([3] (+) [4]) (+) ([5] (+) [6])
then each (+) operation we want to perform in parallel, and combine to answer
[3] (+) [7] (+) [11] = 21
Note, that we split list, or perform operations in any order, because of a additivity.
Is there any ways to do that using any standard library?
You need to generalize your union to any associative binary operator ⊕ such that (a ⊕ b) ⊕ c == a ⊕ (b ⊕ c). If at the same time you even have a unit element that is neutral with respect to ⊕, you have a monoid.
The important aspect of associativity is that you can arbitrarily group chunks of consecutive elements in a list, and ⊕ them in any order, since a ⊕ (b ⊕ (c ⊕ d)) == (a ⊕ b) ⊕ (c ⊕ d) - each bracket can be computed in parallel; then you'd need to "reduce" the "sums" of all brackets, and you've got your map-reduce sorted.
In order for this parallellisation to make sense, you need the chunking operation to be faster than ⊕ - otherwise, doing ⊕ sequentially is better than chunking. One such case is when you have a random access "list" - say, an array. Data.Array.Repa has plenty of parallellized folding functions.
If you are thinking of practicising to implement one yourself, you need to pick a good complex function ⊕ such that the benefit will show.
For example:
import Control.Parallel
import Data.List
pfold :: (Num a, Enum a) => (a -> a -> a) -> [a] -> a
pfold _ [x] = x
pfold mappend xs = (ys `par` zs) `pseq` (ys `mappend` zs) where
len = length xs
(ys', zs') = splitAt (len `div` 2) xs
ys = pfold mappend ys'
zs = pfold mappend zs'
main = print $ pfold (+) [ foldl' (*) 1 [1..x] | x <- [1..5000] ]
-- need a more complicated computation than (+) of numbers
-- so we produce a list of products of many numbers
Here I deliberately use a associative operation, which is called mappend only locally, to show it can work for a weaker notion than a monoid - only associativity matters for parallelism; since parallelism makes sense only for non-empty lists anyway, no need for mempty.
ghc -O2 -threaded a.hs
a +RTS -N1 -s
Gives 8.78 seconds total run time, whereas
a +RTS -N2 -s
Gives 5.89 seconds total run time on my dual core laptop. Obviously, no point trying more than -N2 on this machine.
What you've described is essentially a monoid. In GHCI:
Prelude> :m + Data.Monoid
Prelude Data.Monoid> :info Monoid
class Monoid a where
mempty :: a
mappend :: a -> a -> a
mconcat :: [a] -> a
As you can see a monoid has three associated functions:
The mempty function is sort of like the identity function of the monoid. For example a Num can behave as a monoid apropos two operations: sum and product. For a sum mempty is defined as 0. For a product mempty is defined as 1.
mempty `mappend` a = a
a `mappend` mempty = a
The mappend function is similar to your union function. For exampe for a sum of Nums mappend is defined as (+) and for a product of Nums mappend is defined as (*).
The mconcat function is similar to a fold. However because of the properties of a monoid it doesn't matter whether we fold from the left, fold from the right or fold from an arbitrary position. This is because mappend is supposed to be associative:
(a `mappend` b) `mappend` c = a `mappend` (b `mappend` c)
Note however that Haskell doesn't enforce the monoid laws. Hence if you make a type an instance of the Monoid typeclass then you're responsible to ensure that it satisfies the monoid laws.
In your case it's difficult to understand how union behaves from its type signature: a -> a -> a. Surely you can't make a type variable an instance of a typeclass. That's not allowed. You need to be more specific. What does union actually do?
To give you an example of how to make a type an instance of the monoid typeclass:
newtype Sum a = Sum { getSum :: a }
instance Num a => Monoid (Sum a) where
mempty = 0
mappend = (+)
That's it. We don't need to define the mconcat function because that has a default implementation that depends upon mempty and mappend. Hence when we define mempty and mappend we get mconcat for free.
Now you can use Sum as follows:
getSum . mconcat $ map Sum [1..6]
This is what's happening:
You're mapping the Sum constructor over [1..6] to produce [Sum 1, Sum 2, Sum 3, Sum 4, Sum 5, Sum 6].
You give the resulting list to mconcat which folds it to Sum 21.
You use getSum to extract the Num from Sum 21.
Note however that the default implementation of mconcat is foldr mappend mempty (i.e. it's a right fold). For most cases the default implementation is sufficient. However in your case you might want to override the default implementation:
foldParallel :: Monoid a => [a] -> a
foldParallel [] = mempty
foldParallel [a] = a
foldParallel xs = foldParallel left `mappend` foldParallel right
where size = length xs
index = (size + size `mod` 2) `div` 2
(left, right) = splitAt index xs
Now we can create a new instance of Monoid as follows:
data Something a = Something { getSomething :: a }
instance Monoid (Something a) where
mempty = unionEmpty
mappend = union
mconcat = foldParallel
We use it as follows:
getSomething . mconcat $ map Something [1..6]
Note that I defined mempty as unionEmpty. I don't know what type of data the union function acts on. Hence I don't know what mempty should be defined as. Thus I'm simply calling it unionEmpty. Define it as you see fit.
I know it's a long time after the OP, but I've just happened upon this and thought my experiences might be of help.
If we think about the problem, we can see that:
A fold is essentially a function that takes a list of items, and converts them to a single item which may be the same type as the items in the list, but doesn't have to be: so its type is ([a] -> b).
A parallel fold splits its input list into chunks, folds each chunk separately (in parallel), and then combines the results to derive the final result. For that we need:
A chunk size. This could be calculated with reference to the size of the input list, but that has a significant drawback: in order to determine the size of the list we have to process it, which loses the benefit of laziness. So it is better to make all chunks the same size; this could be hard-coded, but in a generic function it would be better to expose it as a parameter so that it can be varied and tuned to suit the needs of the calling application.
A function that knows how to combine results. This has the type (b -> b -> b).
A suitable generic parallel fold function is thus:
import Control.Parallel
foldParallel :: Int -> ([a] -> b) -> (b -> b -> b) -> [a] -> b
foldParallel _ fold _ [] = fold []
foldParallel chunkSize fold combine xs = par lf $ combine lf rf
where
(left, right) = splitAt chunkSize xs
lf = fold left
rf = foldParallel chunkSize fold combine right
The parallel processing is done explicitly, using the par function which kicks off the evaluation of its first operand, in parallel, and returns the second operand.
It took a while - for an ancient, imperative-programming dinosaur like me - to get my head around the fact that the definitions in the where block don't actually evaluate anything, but just set up things that can be evaluated; hence the fold named as lf can be referenced in both operands of par but is only evaluated once.
The difference that par makes is that if the function just returns combine lf rf, when that is evaluated lf needs to be evaluated, then rf, then combine lf rf. But par lf $ combine lf rf means that lf is already wholly or partly evaluated (in parallel) by the time its value is needed. And because rf is itself a parallel fold, the same is true of the folding of each subsequent chunk.

"Any function on finite lists that is defined by pairing the desired result with the argument list can always be redefined in terms of fold"

I was reading through the paper A tutorial on the universality and
expressiveness of fold, and am stuck on the section about generating tuples. After showing of how the normal definition of dropWhile cannot be defined in terms of fold, an example defining dropWhile using tuples was proved:
dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p = fst . (dropWhilePair p)
dropWhilePair :: (a -> Bool) -> [a] -> ([a], [a])
dropWhilePair p = foldr f v
where
f x (ys,xs) = (if p x then ys else x : xs, x : xs)
v = ([], [])
The paper states:
In fact, this result is an instance of a
general theorem (Meertens, 1992) that states that any function on finite lists that is
defined by pairing the desired result with the argument list can always be redefined
in terms of fold, although not always in a way that does not make use of the original
(possibly recursive) definition for the function.
I looked at Meerten's Paper but do not have the background (category theory? type theory?) and did not quite find how this was proved.
Is there a relatively simple "proof" why this is the case? Or just a simple explanation as to why we can redefine all functions on finite lists in terms of fold if we pair the results with the original list.
Given the remark that you can / may need to use the original function inside, the claim as stated in your question seems trivial to me:
rewriteAsFold :: ([a] -> (b, [a])) -> [a] -> (b, [a])
rewriteAsFold g = foldr f v where
f x ~(ys,xs) = (fst (g (x:xs)), x:xs)
v = (fst (g []), [])
EDIT: Added the ~, after which it seems to work for infinite lists as well.

Resources