How to avoid spaceleak in multiple list traversals? - haskell

is GHC intelligent enough to run multiple operations on lists in 'semi-parallel'?
Consider this (simplified) code:
findElements bigList = do
let special = head . filter isSpecial $ bigList
let others = filter isSpecialOrNormal $ bigList
return (special, others)
(Monad due to original code)
I guess GHC will run the first list operation and will keep all elements in memory so that the second operation is able to work on them.
My problem is that i am running into a spaceleak when dealing with larger files. But i believe it should be able to run in constant space. Is there a way to achieve this?
Update 1
Having written it down like this the solution to this problem of course is to change the order of the two lines.
But my question remains: is the GHC intelligent enough to figure out this semi-parallel processing when it not done in a monad?

I don't think GHC is smart enough to merge these two traversals, or, as is usually the case, GHC could be smart enough, but there are cases where you don't want this behavior, so GHC doesn't do it.
Here's how I would do it, using monoids and foldMap.
import Data.Monoid
import Data.Foldable
First, here's how to write special with foldMap, using the First monoid.
specialF :: a -> First a
specialF a = First $ if isSpecial a then Just a else Nothing
special :: [a] -> a
special as = let (First (Just s)) = foldMap specialF as in s
And similar for specialOrNormal, using the list monoid.
specialOrNormalF :: a -> [a]
specialOrNormalF a = if isSpecialOrNormal a then [a] else []
specialOrNormal :: [a] -> [a]
specialOrNormal = foldMap specialOrNormalF
One neat thing about monoids is that a tuple of monoids is also a monoid, which makes merging these folds easy:
findElements :: [a] -> (a, [a])
findElements bigList =
let (First (Just s), son) =
foldMap (\a -> (specialF a, specialOrNormalF a)) bigList
in (s, son)
And if you like point-free code, you can write the whole thing like this:
findElements :: [a] -> (a, [a])
findElements =
first (fromJust . getFirst) .
foldMap
( First . mfilter isSpecial . return
&&& mfilter isSpecialOrNormal . return
)

Related

Maybe monad and a list

Ok, so I am trying to learn how to use monads, starting out with maybe. I've come up with an example that I can't figure out how to apply it to in a nice way, so I was hoping someone else could:
I have a list containing a bunch of values. Depending on these values, my function should return the list itself, or a Nothing. In other words, I want to do a sort of filter, but with the consequence of a hit being the function failing.
The only way I can think of is to use a filter, then comparing the size of the list I get back to zero. Is there a better way?
This looks like a good fit for traverse:
traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
That's a bit of a mouthful, so let's specialise it to your use case, with lists and Maybe:
GHCi> :set -XTypeApplications
GHCi> :t traverse #[] #Maybe
traverse #[] #Maybe :: (a -> Maybe b) -> [a] -> Maybe [b]
It works like this: you give it an a -> Maybe b function, which is applied to all elements of the list, just like fmap does. The twist is that the Maybe b values are then combined in a way that only gives you a modified list if there aren't any Nothings; otherwise, the overall result is Nothing. That fits your requirements like a glove:
noneOrNothing :: (a -> Bool) -> [a] -> Maybe [a]
noneOrNothing p = traverse (\x -> if p x then Nothing else Just x)
(allOrNothing would have been a more euphonic name, but then I'd have to flip the test with respect to your description.)
There are a lot of things we might discuss about the Traversable and Applicative classes. For now, I will talk a bit more about Applicative, in case you haven't met it yet. Applicative is a superclass of Monad with two essential methods: pure, which is the same thing as return, and (<*>), which is not entirely unlike (>>=) but crucially different from it. For the Maybe example...
GHCi> :t (>>=) #Maybe
(>>=) #Maybe :: Maybe a -> (a -> Maybe b) -> Maybe b
GHCi> :t (<*>) #Maybe
(<*>) #Maybe :: Maybe (a -> b) -> Maybe a -> Maybe b
... we can describe the difference like this: in mx >>= f, if mx is a Just-value, (>>=) reaches inside of it to apply f and produce a result, which, depending on what was inside mx, will turn out to be a Just-value or a Nothing. In mf <*> mx, though, if mf and mx are Just-values you are guaranteed to get a Just value, which will hold the result of applying the function from mf to the value from mx. (By the way: what will happen if mf or mx are Nothing?)
traverse involves Applicative because the combining of values I mentioned at the beginning (which, in your example, turns a number of Maybe a values into a Maybe [a]) is done using (<*>). As your question was originally about monads, it is worth noting that it is possible to define traverse using Monad rather than Applicative. This variation goes by the name mapM:
mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
We prefer traverse to mapM because it is more general -- as mentioned above, Applicative is a superclass of Monad.
On a closing note, your intuition about this being "a sort of filter" makes a lot of sense. In particular, one way to think about Maybe a is that it is what you get when you pick booleans and attach values of type a to True. From that vantage point, (<*>) works as an && for these weird booleans, which combines the attached values if you happen to supply two of them (cf. DarthFennec's suggestion of an implementation using any). Once you get used to Traversable, you might enjoy having a look at the Filterable and Witherable classes, which play with this relationship between Maybe and Bool.
duplode's answer is a good one, but I think it is also helpful to learn to operate within a monad in a more basic way. It can be a challenge to learn every little monad-general function, and see how they could fit together to solve a specific problem. So, here's a DIY solution that shows how to use do notation and recursion, tools which can help you with any monadic question.
forbid :: (a -> Bool) -> [a] -> Maybe [a]
forbid _ [] = Just []
forbid p (x:xs) = if p x
then Nothing
else do
remainder <- forbid p xs
Just (x : remainder)
Compare this to an implementation of remove, the opposite of filter:
remove :: (a -> Bool) -> [a] -> [a]
remove _ [] = []
remove p (x:xs) = if p x
then remove p xs
else
let remainder = remove p xs
in x : remainder
The structure is the same, with just a couple differences: what you want to do when the predicate returns true, and how you get access to the value returned by the recursive call. For remove, the returned value is a list, and so you can just let-bind it and cons to it. With forbid, the returned value is only maybe a list, and so you need to use <- to bind to that monadic value. If the return value was Nothing, bind will short-circuit the computation and return Nothing; if it was Just a list, the do block will continue, and cons a value to the front of that list. Then you wrap it back up in a Just.

One-line implementation of split in Haskell

What I want is the following one (which I think should be included in prelude since it is very useful in text processing):
split :: Eq a => [a] -> [a] -> [[a]]
e.g:
split ";;" "hello;;world" = ["hello", "world"]
split from Data.List.Utils isn't in base. I feel there should be a short-and-sweet implementation by composing a few base functions, but I can't figure it out. Am I missing something?
Arguably, the best way to check how feasible a short-and-sweet splitOn (or split, as you and MissingH call it -- here I will stick to the name used by the split and extra packages) is trying to write it [note 1].
(By the way, I will use recursion-schemes functions and concepts in this answer, as I find systematising things a bit helps me think about this kind of problem. Do let me know if anything is unclear.)
The type of splitOn is [note 2]:
splitOn :: Eq a => [a] -> [a] -> [[a]]
One way to write a recursive function that builds one data structure from another, like splitOn does, begins by asking whether to do it by walking the original structure in a bottom-up or a top-down way (for lists, that amounts to right-to-left and left-to-right respectively). A bottom-up walk is more naturally expressed as some kind of fold:
foldr #[] :: (a -> b -> b) -> b -> [a] -> b
cata #[_] :: (ListF a b -> b) -> [a] -> b
(cata, short for catamorphism, is how recursion-schemes expresses a vanilla fold. The ListF a b -> b function, called an algebra in the jargon, specifies what happens in each fold step. data ListF a b = Nil | Cons a b, and so, in the case of lists, the algebra amounts to the two first arguments of foldr rolled into one -- the binary function corresponds to the Cons case, and the seed of the fold, to the Nil one.)
A top-down walk, on the other hand, lends itself to an unfold:
unfoldr :: (b -> Maybe (a, b)) -> b -> [a] -- found in Data.List
ana #[_] :: (b -> ListF a b) -> b -> [a]
(ana, short for anamorphism, is the vanilla unfold in recursion-schemes. The b -> ListF a b function is a coalgebra; it specifies what happens in each unfold step. For a list, the possibilities are either emitting a list element and an updated seed or generating an empty list and terminating the unfold.)
Should splitOn be bottom-up or top-down? To implement it, we need to, at any given position in the list, look ahead in order to check whether the current list segment starts with the delimiter. That being so, it makes sense to reach for a top-down solution i.e. an unfold/anamorphism.
Playing with ways to write splitOn as an unfold shows another thing to consider: you want each individual unfold step to generate a fully-formed list chunk. Not doing so will, at best, lead you to unnecessarily walk the original list twice [note 3]; at worst, catastrophic memory usage and stack overflows on long list chunks await [note 4]. One way to achieve that is through a breakOn function, like the one in Data.List.Extra...
breakOn :: Eq a => [a] -> [a] -> ([a], [a])
... which is like break from the Prelude, except that, instead of applying a predicate to each element, it checks whether the remaining list segment has the first argument as a prefix [note 5].
With breakOn at hand, we can write a proper splitOn implementation -- one that, compiled with optimisations, matches in performance the library ones mentioned at the beginning:
splitOnAtomic :: Eq a => [a] -> [a] -> [[a]]
splitOnAtomic delim
| null delim = error "splitOnAtomic: empty delimiter"
| otherwise = apo coalgSplit
where
delimLen = length delim
coalgSplit = \case
[] -> Cons [] (Left [])
li ->
let (ch, xs) = breakOn (delim `isPrefixOf`) li
in Cons ch (Right (drop delimLen xs))
(apo, short for apomorphism, is an unfold that can be short-circuited. That is done by emitting from an unfold step, rather than the usual updated seed -- signaled by Right -- a final result -- signaled by Left. Short-circuiting is needed here because, in the empty list case, we want neither to produce an empty list by returning Nil -- which would wrongly result in splitOn delim [] = [] -- nor to resort to Cons [] [] -- which would generate an infinite tail of []. This trick corresponds directly to the additional splitOn _ [] = [[]] case added to the Data.List.Extra implementation.)
After a few slight detours, we can now address your actual question. splitOn is tricky to write in a short way because, firstly, the recursion pattern it uses isn't entirely trivial; secondly, a good implementation requires a few details that are inconvenient for golfing; and thirdly, what appears to be the best implementation relies crucially on breakOn, which is not in base.
Notes:
[note 1]: Here are the imports and pragmas needed to run the snippets in this answer:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TemplateHaskell #-}
import Data.Functor.Foldable
import Data.Functor.Foldable.TH
import Data.List
import Data.Maybe
[note 2]: An alternative type might be Eq a => NonEmpty a -> [a] -> NonEmpty [a], if one wants to put precision above all else. I won't bother with that here to avoid unnecessary distractions.
[note 3]: As in this rather neat implementation, which uses two unfolds -- the first one (ana coalgMark) replaces the delimiters with Nothing, so that the second one (apo coalgSplit) can split in a straightforward way:
splitOnMark :: Eq a => [a] -> [a] -> [[a]]
splitOnMark delim
| null delim = error "splitOnMark: empty delimiter"
| otherwise = apo coalgSplit . ana coalgMark
where
coalgMark = \case
[] -> Nil
li#(x:xs) -> case stripPrefix delim li of
Just ys -> Cons Nothing ys
Nothing -> Cons (Just x) xs
coalgSplit = \case
[] -> Cons [] (Left [])
mxs ->
let (mch, mys) = break isNothing mxs
in Cons (catMaybes mch) (Right (drop 1 mys))
(What apo is and what Left and Right are doing here will be covered a little further in the main body of the answer.)
This implementation has fairly acceptable performance, though with optimisations it is a slower than the one in the main body of the answer by a (modest) constant factor. It might be a little easier to golf this one, though...
[note 4]: As in this single unfold implementation, which uses a coalgebra that calls itself recursively to build each chunk as a (difference) list:
splitOnNaive :: Eq a => [a] -> [a] -> [[a]]
splitOnNaive delim
| null delim = error "splitOn: empty delimiter"
| otherwise = apo coalgSplit . (,) id
where
coalgSplit = \case
(ch, []) -> Cons (ch []) (Left [])
(ch, li#(x:xs)) -> case stripPrefix delim li of
Just ys -> Cons (ch []) (Right (id, ys))
Nothing -> coalg (ch . (x :), xs)
Having to decide at each element whether to grow the current chunk or to start a new one is in itself problematic, as it breaks laziness.
[note 5]: Here is how Data.List.Extra implements breakOn. If we want to achieve that using a recursion-schemes unfold, one good strategy is defining a data structure that encodes exactly what we are trying to build:
data BrokenList a = Broken [a] | Unbroken a (BrokenList a)
deriving (Eq, Show, Functor, Foldable, Traversable)
makeBaseFunctor ''BrokenList
A BrokenList is just like a list, except that the empty list is replaced by the (non-recursive) Broken constructor, which marks the break point and holds the remainder of the list. Once generated by an unfold, a BrokenList can be easily folded into a pair of lists: the elements in the Unbroken values are consed into one list, and the list in Broken becomes the other one:
breakOn :: ([a] -> Bool) -> [a] -> ([a], [a])
breakOn p = hylo algPair coalgBreak
where
coalgBreak = \case
[] -> BrokenF []
li#(x:xs)
| p li -> BrokenF li
| otherwise -> UnbrokenF x xs
algPair = \case
UnbrokenF x ~(xs, ys) -> (x : xs, ys)
BrokenF ys -> ([], ys)
(hylo, short for hylomorphism, is simply an ana followed by a cata, i.e. an unfold followed by a fold. hylo, as implemented in recursion-schemes, takes advantage of the fact that the intermediate data structure, created by the unfold and then immediately consumed by the fold, can be fused away, leading to significant performance gains.)
It is worth mentioning that the lazy pattern match in algPair is crucial to preserve laziness. The Data.List.Extra implementation linked to above achieves that by using first from Control.Arrow, which also matches the pair given to it lazily.

How to implement delete with foldr in Haskell

I've been studying folds for the past few days. I can implement simple functions with them, like length, concat and filter. What I'm stuck at is trying to implement with foldr functions like delete, take and find. I have implemented these with explicit recursion but it doesn't seem obvious to me how to convert these types of functions to right folds.
I have studied the tutorials by Graham Hutton and Bernie Pope. Imitating Hutton's dropWhile, I was able to implement delete with foldr but it fails on infinite lists.
From reading Implement insert in haskell with foldr, How can this function be written using foldr? and Implementing take using foldr, it would seem that I need to use foldr to generate a function which then does something. But I don't really understand these solutions and don't have an idea how to implement for example delete this way.
Could you explain to me a general strategy for implementing with foldr lazy versions of functions like the ones I mentioned. Maybe you could also implement delete as an example since this probably is one of the easiest.
I'm looking for a detailed explanation that a beginner can understand. I'm not interested in just solutions, I want to develop an understanding so I can come up with solutions to similar problems myself.
Thanks.
Edit: At the moment of writing there is one useful answer but it's not quite what I was looking for. I'm more interested in an approach that uses foldr to generate a function, which then does something. The links in my question have examples of this. I don't quite understand those solutions so I would like to have more information on this approach.
delete is a modal search. It has two different modes of operation - whether it's already found the result or not. You can use foldr to construct a function that passes the state down the line as each element is checked. So in the case of delete, the state can be a simple Bool. It's not exactly the best type, but it will do.
Once you have identified the state type, you can start working on the foldr construction. I'm going to walk through figuring it out the way I did. I'll be enabling ScopedTypeVariables just so I can annotate the type of subexpressions better. One you know the state type, you know you want foldr to generate a function taking a value of that type, and returning a value of the desired final type. That's enough to start sketching things.
{-# LANGUAGE ScopedTypeVariables #-}
delete :: forall a. Eq a => a -> [a] -> [a]
delete a xs = foldr f undefined xs undefined
where
f :: a -> (Bool -> [a]) -> (Bool -> [a])
f x g = undefined
It's a start. The exact meaning of g is a little bit tricky here. It's actually the function for processing the rest of the list. It's accurate to look at it as a continuation, in fact. It absolutely represents performing the rest of the folding, with your whatever state you choose to pass along. Given that, it's time to figure out what to put in some of those undefined places.
{-# LANGUAGE ScopedTypeVariables #-}
delete :: forall a. Eq a => a -> [a] -> [a]
delete a xs = foldr f undefined xs undefined
where
f :: a -> (Bool -> [a]) -> (Bool -> [a])
f x g found | x == a && not found = g True
| otherwise = x : g found
That seems relatively straightforward. If the current element is the one being searched for, and it hasn't yet been found, don't output it, and continue with the state set to True, indicating it's been found. otherwise, output the current value and continue with the current state. This just leaves the rest of the arguments to foldr. The last one is the initial state. The other one is the state function for an empty list. Ok, those aren't too bad either.
{-# LANGUAGE ScopedTypeVariables #-}
delete :: forall a. Eq a => a -> [a] -> [a]
delete a xs = foldr f (const []) xs False
where
f :: a -> (Bool -> [a]) -> (Bool -> [a])
f x g found | x == a && not found = g True
| otherwise = x : g found
No matter what the state is, produce an empty list when an empty list is encountered. And the initial state is that the element being searched for has not yet been found.
This technique is also applicable in other cases. For instance, foldl can be written as a foldr this way. If you look at foldl as a function that repeatedly transforms an initial accumulator, you can guess that's the function being produced - how to transform the initial value.
{-# LANGUAGE ScopedTypeVariables #-}
foldl :: forall a b. (a -> b -> a) -> a -> [b] -> a
foldl f z xs = foldr g id xs z
where
g :: b -> (a -> a) -> (a -> a)
g x cont acc = undefined
The base cases aren't too tricky to find when the problem is defined as manipulating the initial accumulator, named z there. The empty list is the identity transformation, id, and the value passed to the created function is z.
The implementation of g is trickier. It can't just be done blindly on types, because there are two different implementations that use all the expected values and type-check. This is a case where types aren't enough, and you need to consider the meanings of the functions available.
Let's start with an inventory of the values that seem like they should be used, and their types. The things that seem like they must need to be used in the body of g are f :: a -> b -> a, x :: b, cont :: (a -> a), and acc :: a. f will obviously take x as its second argument, but there's a question of the appropriate place to use cont. To figure out where it goes, remember that it represents the transformation function returned by processing the rest of the list, and that foldl processes the current element and then passes the result of that processing to the rest of the list.
{-# LANGUAGE ScopedTypeVariables #-}
foldl :: forall a b. (a -> b -> a) -> a -> [b] -> a
foldl f z xs = foldr g id xs z
where
g :: b -> (a -> a) -> (a -> a)
g x cont acc = cont $ f acc x
This also suggests that foldl' can be written this way with only one tiny change:
{-# LANGUAGE ScopedTypeVariables #-}
foldl' :: forall a b. (a -> b -> a) -> a -> [b] -> a
foldl' f z xs = foldr g id xs z
where
g :: b -> (a -> a) -> (a -> a)
g x cont acc = cont $! f acc x
The difference is that ($!) is used to suggest evaluation of f acc x before it's passed to cont. (I say "suggest" because there are some edge cases where ($!) doesn't force evaluation even as far as WHNF.)
delete doesn't operate on the entire list evenly. The structure of the computation isn't just considering the whole list one element at a time. It differs after it hits the element it's looking for. This tells you it can't be implemented as just a foldr. There will have to be some sort of post-processing involved.
When that happens, the general pattern is that you build a pair of values and just take one of them at completion of the foldr. That's probably what you did when you imitated Hutton's dropWhile, though I'm not sure since you didn't include code. Something like this?
delete :: Eq a => a -> [a] -> [a]
delete a = snd . foldr (\x (xs1, xs2) -> if x == a then (x:xs1, xs1) else (x:xs1, x:xs2)) ([], [])
The main idea is that xs1 is always going to be the full tail of the list, and xs2 is the result of the delete over the tail of the list. Since you only want to remove the first element that matches, you don't want to use the result of delete over the tail when you do match the value you're searching for, you just want to return the rest of the list unchanged - which fortunately is what's always going to be in xs1.
And yeah, that doesn't work on infinite lists - but only for one very specific reason. The lambda is too strict. foldr only works on infinite lists when the function it is provided doesn't always force evaluation of its second argument, and that lambda does always force evaluation of its second argument in the pattern match on the pair. Switching to an irrefutable pattern match fixes that, by allowing the lambda to produce a constructor before ever examining its second argument.
delete :: Eq a => a -> [a] -> [a]
delete a = snd . foldr (\x ~(xs1, xs2) -> if x == a then (x:xs1, xs1) else (x:xs1, x:xs2)) ([], [])
That's not the only way to get that result. Using a let-binding or fst and snd as accessors on the tuple would also do the job. But it is the change with the smallest diff.
The most important takeaway here is to be very careful with handling the second argument to the reducing function you pass to foldr. You want to defer examining the second argument whenever possible, so that the foldr can stream lazily in as many cases as possible.
If you look at that lambda, you see that the branch taken is chosen before doing anything with the second argument to the reducing function. Furthermore, you'll see that most of the time, the reducing function produces a list constructor in both halves of the result tuple before it ever needs to evaluate the second argument. Since those list constructors are what make it out of delete, they are what matter for streaming - so long as you don't let the pair get in the way. And making the pattern-match on the pair irrefutable is what keeps it out of the way.
As a bonus example of the streaming properties of foldr, consider my favorite example:
dropWhileEnd :: (a -> Bool) -> [a] -> [a]
dropWhileEnd p = foldr (\x xs -> if p x && null xs then [] else x:xs) []
It streams - as much as it can. If you figure out exactly when and why it does and doesn't stream, you'll understand pretty much every detail of the streaming structure of foldr.
here is a simple delete, implemented with foldr:
delete :: (Eq a) => a -> [a] -> [a]
delete a xs = foldr (\x xs -> if x == a then (xs) else (x:xs)) [] xs

Choosing the non-empty Monoid

I need a function which will choose a non-empty monoid. For a list this will mean the following behaviour:
> [1] `mor` []
[1]
> [1] `mor` [2]
[1]
> [] `mor` [2]
[2]
Now, I've actually implemented it but am wondering wether there exists some standard alternative, because it seems to be a kind of a common case. Unfortunately Hoogle doesn't help.
Here's my implementation:
mor :: (Eq a, Monoid a) => a -> a -> a
mor a b = if a /= mempty then a else b
If your lists contain at most one element, they're isomorphic to Maybe, and for that there's the "first non empty" monoid: First from Data.Monoid. It's a wrapper around Maybe a values, and mappend returns the first Just value:
import Data.Monoid
main = do
print $ (First $ Just 'a') <> (First $ Just 'b')
print $ (First $ Just 'a') <> (First Nothing)
print $ (First Nothing) <> (First $ Just 'b')
print $ (First Nothing) <> (First Nothing :: First Char)
==> Output:
First {getFirst = Just 'a'}
First {getFirst = Just 'a'}
First {getFirst = Just 'b'}
First {getFirst = Nothing}
Conversion [a] -> Maybe a is achieved using Data.Maybe.listToMaybe.
On a side note: this one does not constrain the typeclass of the wrapped type; in your question, you need an Eq instance to compare for equality with mempty. This comes at the cost of having the Maybe type, of course.
[This is really a long comment rather than an answer]
In my comment, when I said "monoidal things have no notion of introspection" - I meant that you can't perform analysis (pattern matching, equality, <, >, etc.) on monoids. This is obvious of course - the API of Monoids is only unit (mempty) and an operation mappend (more abstractly <>) that takes two monodial things and returns one. The definition of mappend for a type is free to use case analysis, but afterwards all you can do with monoidal things is use the Monoid API.
It's something of a folklore in the Haskell community to avoid inventing things, prefering instead to use objects from mathematics and computer science (including functional programming history). Combining Eq (which needs analysis of is arguments) and Monoid introduces a new class of things - monoids that support enough introspection to allow equality; and at this point there is a reasonable argument that an Eq-Monoid thing goes against the spirit of its Monoid superclass (Monoids are opaque). As this is both a new class of objects and potentially contentious - a standard implementation won't exist.
First, your mor function looks rather suspicious because it requires a Monoid but never uses mappend, and so it is significantly more constrained than necessary.
mor :: (Eq a, Monoid a) => a -> a -> a
mor a b = if a /= mempty then a else b
You could accomplish the same thing with merely a Default constraint:
import Data.Default
mor :: (Eq a, Default a) => a -> a -> a
mor a b = if a /= def then a else b
and I think that any use of Default should also be viewed warily because, as I believe many Haskellers complain, it is a class without principles.
My second thought is that it seems that the data type you're really dealing with here is Maybe (NonEmpty a), not [a], and the Monoid you're actually talking about is First.
import Data.Monoid
morMaybe :: Maybe a -> Maybe a -> Maybe a
morMaybe x y = getFirst (First x <> First y)
And so then we could use that with lists, as in your example, under the (nonEmpty, maybe [] toList) isomorphism between [a] and Maybe (NonEmpty a):
import Data.List.NonEmpty
morList :: [t] -> [t] -> [t]
morList x y = maybe [] toList (nonEmpty x `mor` nonEmpty y)
λ> mor'list [1] []
[1]
λ> mor'list [] [2]
[2]
λ> mor'list [1] [2]
[1]
(I'm sure that somebody more familiar with the lens library could provide a more impressive concise demonstration here, but I don't immediately know how.)
You could extend Monoid with a predicate to test whether an element is an identity.
class Monoid a => TestableMonoid a
where
isMempty :: a -> Bool
morTestable :: a -> a -> a
morTestable x y = if isMempty x then y else x
Not every monoid can have an instance of TestableMonoid, but plenty (including list) can.
instance TestableMonoid [a]
where
isMempty = null
We could even then write a newtype wrapper with a Monoid:
newtype Mor a = Mor { unMor :: a }
instance TestableMonoid a => Monoid (Mor a)
where
mempty = Mor mempty
Mor x `mappend` Mor y = Mor (morTestable x y)
λ> unMor (Mor [1] <> Mor [])
[1]
λ> unMor (Mor [] <> Mor [2])
[2]
λ> unMor (Mor [1] <> Mor [2])
[1]
So that leaves open the question of whether the TestableMonoid class deserves to exist. It certainly seems like a more "algebraically legitimate" class than Default, at least, because we can give it a law that relates it to Monoid:
isEmpty x iff mappend x = id
But I do question whether this actually has any common use cases. As I said earlier, the Monoid constraint is superfluous for your use case because you never mappend. So we should ask, then, can we envision a situation in which one might need both mappend and isMempty, and thus have a legitimate need for a TestableMonoid constraint? It's possible I'm being shortsighted here, but I can't envision a case.
I think this is because of something Stephen Tetley touched on when he said that this "goes against the spirit of its Monoid." Tilt your head at the type signature of mappend with a slightly different parenthesization:
mappend :: a -> (a -> a)
mappend is a mapping from members of a set a to functions a -> a. A monoid is a way of viewing values as functions over those values. The monoid is a view of the world of a only through the window of what these functions let us see. And functions are very limited in what they let us see. The only thing we ever do with them is apply them. We never ask anything else of a function (as Stephen said, we have no introspection into them). So although, yes, you can bolt anything you want onto a subclass, in this case the thing we're bolting on feels very different in character from the base class we are extending, and it feels unlikely that there would be much intersection between the use cases of functions and the use cases of things that have general equality or a predicate like isMempty.
So finally I want to come back around to the simple and precise way to write this: Write code at the value level and stop worrying classes. You don't need Monoid and you don't need Eq, all you need is an additional argument:
morSimple :: (t -> Bool) -- ^ Determine whether a value should be discarded
-> t -> t -> t
morSimple f x y = if f x then y else x
λ> morSimple null [1] []
[1]
λ> morSimple null [1] [2]
[1]
λ> morSimple null [] [2]
[2]

"Zipping" a plain list with a nested list

I am looking for an elegant solution to the following problem. I have two lists of the following types:
[Float] and, [[Float]]
The first list contains an infinite amount of random values. The second list contains values I no longer care about. Its structure is finite and must be preserved. The values of the first list needs to be replacing those of the second.
Obviously, since the first list contains random values, I do not want to use them twice. Can anyone help me do this in a clear, concise, and terse way?
scramble :: [Float] -> [[Float]] -> [[Float]]
Give me your best shot
Using the split package for splitting:
import Data.List.Split (splitPlaces)
scramble x y = splitPlaces (map length y) x
Will this do?
flip . (evalState .) . traverse . traverse . const . state $ head &&& tail
EDIT: let me expand on the construction...
The essential centre of it is traverse . traverse. If you stare at the problem with sufficiently poor spectacles, you can see that it's "do something with the elements of a container of containers". For that sort of thing, traverse (from Data.Traversable) is a very useful gadget (ok, I'm biased).
traverse :: (Traversable f, Applicative a) => (s -> a t) -> f s -> a (f t)
or, if I change to longer but more suggestive type variables
traverse :: (Traversable containerOf, Applicative doingSomethingToGet) =>
(s -> doingSomethingToGet t) ->
containerOf s -> doingSomethingToGet (containerOf t)
Crucially, traverse preserves the structure of the container it operates on, whatever that might be. If you view traverse as a higher-order function, you can see that it gives back an operator on containers whose type fits with the type of operators on elements it demands. That's to say (traverse . traverse) makes sense, and gives you structure-preserving operations on two layers of container.
traverse . traverse ::
(Traversable g, Traversable f, Applicative a) => (s -> a t) -> g (f s) -> a (g (f t))
So we've got the key gadget for structure-preserving "do something" operations on lists of lists. The length and splitAt approach works fine for lists (the structure of a list is given by its length), but the essential characteristic of lists which enables that approach is already pretty much bottled by the Traversable class.
Now we need to figure out how to "do something". We want to replace the old elements with new things drawn successively from a supply stream. If we were allowed the side-effect of updating the supply, we could say what to do at each element: "return head of supply, updating supply with its tail". The State s monad (in Control.Monad.State which is an instance of Applicative, from Control.Applicative) lets us capture that idea. The type State s a represents computations which deliver a value of type a whilst mutating a state of type s. Typical such computations are made by this gadget.
state :: (s -> (a, s)) -> State s a
That's to say, given an initial state, just compute the value and the new state. In our case, s is a stream, head gets the value, tail gets the new state. The &&& operator (from Control.Arrow) is a nice way to glue two functions on the same data to get a function making a pair. So
head &&& tail :: [x] -> (x, [x])
which makes
state $ head &&& tail :: State [x] x
and thus
const . state $ head &&& tail :: u -> State [x] x
explains what to "do" with each element of the old container, namely ignore it and take a new element from the head of the supply stream.
Feeding that into (traverse . traverse) gives us a big mutatey traversal of type
f (g u) -> State [x] (f (g x))
where f and g are any Traversable structures (e.g. lists).
Now, to extract the function we want, taking the initial supply stream, we need to unpack the state-mutating computation as a function from initial state to final value. That's what this does:
evalState :: State s a -> s -> a
So we end up with something in
f (g u) -> [x] -> f (g x)
which had better get flipped if it's to match the original spec.
tl;dr The State [x] monad is a readymade tool for describing computations which read and update an input stream. The Traversable class captures a readymade notion of structure-preserving operation on containers. The rest is plumbing (and/or golf).
This is the obvious way to do it, but I take it this isn't terse enough?
scramble :: [a] -> [[a]] -> [[a]]
scramble _ [] = []
scramble xs (y : ys) = some : scramble rest ys
where (some, rest) = splitAt (length y) xs

Resources