I wanted to make a generic function that folds over a wide range of inputs (see Making a single function work on lists, ByteStrings and Texts (and perhaps other similar representations)). As one answer suggested, the ListLike is just for that. Its FoldableLL class defines an abstraction for anything that is foldable. However, I need a monadic fold. So I need to define foldM in terms of foldl/foldr.
So far, my attempts failed. I tried to define
foldM'' :: (Monad m, LL.FoldableLL full a) => (b -> a -> m b) -> b -> full -> m b
foldM'' f z = LL.foldl (\acc x -> acc >>= (`f` x)) (return z)
but it runs out of memory on large inputs - it builds a large unevaluated tree of computations. For example, if I pass a large text file to
main :: IO ()
main = getContents >>= foldM'' idx 0 >> return ()
where
-- print the current index if 'a' is found
idx !i 'a' = print i >> return (i + 1)
idx !i _ = return (i + 1)
it eats up all memory and fails.
I have a feeling that the problem is that the monadic computations are composed in a wrong order - like ((... >>= ...) >>= ...) instead of (... >>= (... >>= ...)) but so far I didn't find out how to fix it.
Workaround: Since ListLike exposes mapM_, I constructed foldM on ListLikes by wrapping the accumulator into the state monad:
modifyT :: (Monad m) => (s -> m s) -> StateT s m ()
modifyT f = get >>= \x -> lift (f x) >>= put
foldLLM :: (LL.ListLike full a, Monad m) => (b -> a -> m b) -> b -> full -> m b
foldLLM f z c = execStateT (LL.mapM_ (\x -> modifyT (\b -> f b x)) c) z
While this works fine on large data sets, it's not very nice. And it doesn't answer the original question, if it's possible to define it on data that are just FoldableLL (without mapM_).
So the goal is to reimplement foldM using either foldr or foldl. Which one should it be? We want the input to be processed lazily and allow for infinte lists, this rules out foldl. So foldr is it going to be.
So here is the definition of foldM from the standard library.
foldM :: (Monad m) => (a -> b -> m a) -> a -> [b] -> m a
foldM _ a [] = return a
foldM f a (x:xs) = f a x >>= \fax -> foldM f fax xs
The thing to remember about foldr is that its arguments simply replace [] and : in the list (ListLike abstracts over that, but it still serves as a guiding principle).
So what should [] be replaced with? Clearly with return a. But where does a come from? It won’t be the initial a that is passed to foldM – if the list is not empty, when foldr reaches the end of the list, the accumulator should have changed. So we replace [] by a function that takes an accumulator and returns it in the underlying monad: \a -> return a (or simply return). This also gives the type of the thing that foldr will calculate: a -> m a.
And what should we replace : with? It needs to be a function b -> (a -> m a) -> (a -> m a), taking the first element of the list, the processed tail (lazily, of course) and the current accumulator. We can figure it out by taking hints from the code above: It is going to be \x rest a -> f a x >>= rest. So our implementation of foldM will be (adjusting the type variables to match them in the code above):
foldM'' :: (Monad m) => (a -> b -> m a) -> a -> [b] -> m a
foldM'' f z list = foldr (\x rest a -> f a x >>= rest) return list z
And indeed, now your program can consume arbitrary large input, spitting out the results as you go.
We can even prove, inductively, that the definitions are semantically equal (although we should probably do coinduction or take-induction to cater for infinite lists).
We want to show
foldM f a xs = foldM'' f a xs
for all xs :: [b]. For xs = [] we have
foldM f a []
≡ return a -- definition of foldM
≡ foldr (\x rest a -> f a x >>= rest) return [] a -- definition of foldr
≡ foldM'' f a [] -- definition of foldM''
and, assuming we have it for xs, we show it for x:xs:
foldM f a (x:xs)
≡ f a x >>= \fax -> foldM f fax xs --definition of foldM
≡ f a x >>= \fax -> foldM'' f fax xs -- induction hypothesis
≡ f a x >>= \fax -> foldr (\x rest a -> f a x >>= rest) return xs fax -- definition of foldM''
≡ f a x >>= foldr (\x rest a -> f a x >>= rest) return xs -- eta expansion
≡ foldr (\x rest a -> f a x >>= rest) return (x:xs) -- definition of foldr
≡ foldM'' f a (x:xs) -- definition of foldM''
Of course this equational reasoning does not tell you anything about the performance properties you were interested in.
Related
How can I get a maximum element of an effectful container where computing attribute to compare against also triggers an effect?
There has to be more readable way of doing things like:
latest dir = Turtle.fold (z (ls dir)) Fold.maximum
z :: MonadIO m => m Turtle.FilePath -> m (UTCTime, Turtle.FilePath)
z mx = do
x <- mx
d <- datefile x
return (d, x)
I used overloaded version rather than non-overloaded maximumBy but the latter seems better suite for ad-hoc attribute selection.
How can I be more methodic in solving similar problems?
So I know nothing about Turtle; no idea whether this fits well with the rest of the Turtle ecosystem. But since you convinced me in the comments that maximumByM is worth writing by hand, here's how I would do it:
maximumOnM :: (Monad m, Ord b) => (a -> m b) -> [a] -> m a
maximumOnM cmp [x] = return x -- skip the effects if there's no need for comparison
maximumOnM cmp (x:xs) = cmp x >>= \b -> go x b xs where
go x b [] = return x
go x b (x':xs) = do
b' <- cmp x'
if b < b' then go x' b' xs else go x b xs
I generally prefer the *On versions of things -- which take a function that maps to an Orderable element -- to the *By versions -- which take a function that does the comparison directly. A maximumByM would be similar but have a type like Monad m => (a -> a -> m Ordering) -> [a] -> m a, but this would likely force you to redo effects for each a, and I'm guessing it's not what you want. I find *On more often matches with the thing I want to do and the performance characteristics I want.
Since you're already familiar with Fold, you might want to get to know FoldM, which is similar.
data FoldM m a b =
-- FoldM step initial extract
forall x . FoldM (x -> a -> m x) (m x) (x -> m b)
You can write:
maximumOnM ::
(Ord b, Monad m)
=> (a -> m b) -> FoldM m a (Maybe a)
maximumOnM f = FoldM combine (pure Nothing) (fmap snd)
where
combine Nothing a = do
f_a <- f a
pure (Just (f_a, a))
combine o#(Just (f_old, old)) new = do
f_new <- f new
if f_new > f_old
then pure $ Just (f_new, new)
else pure o
Now you can use Foldl.foldM to run the fold on a list (or other Foldable container). Like Fold, FoldM has an Applicative instance, so you can combine multiple effectful folds into one that interleaves the effects of each of them and combines their results.
It's possible to run effects on foldables using reducers package.
I'm not sure if it's correct, but it leverages existing combinators and instances (except for Bounded (Maybe a)).
import Data.Semigroup.Applicative (Ap(..))
import Data.Semigroup.Reducer (foldReduce)
import Data.Semigroup (Max(..))
import System.IO (withFile, hFileSize, IOMode(..))
-- | maxLength
--
-- >>> getMax $ maxLength ["abc","a","hello",""]
-- 5
maxLength :: [String] -> (Max Int)
maxLength = foldReduce . map (length)
-- | maxLengthIO
--
-- Note, this runs IO...
--
-- >>> (getAp $ maxLengthIO ["package.yaml", "src/Lib.hs"]) >>= return . getMax
-- Just 1212
--
-- >>> (getAp $ maxLengthIO []) >>= return . getMax
-- Nothing
maxLengthIO :: [String] -> Ap IO (Max (Maybe Integer))
maxLengthIO xs = foldReduce (map (fmap Just . f) xs) where
f :: String -> IO Integer
f s = withFile s ReadMode hFileSize
instance Ord a => Bounded (Maybe a) where
maxBound = Nothing
minBound = Nothing
I understand the definition of >>= in term of join
xs >>= f = join (fmap f xs)
which also tells us that fmap + join yields >>=
I was wondering if for the List monad it's possible to define without join, as we do for example for Maybe:
>>= m f = case m of
Nothing -> Nothing
Just x -> f x
Sure. The actual definition in GHC/Base.hs is in terms of the equivalent list comprehension:
instance Monad [] where
xs >>= f = [y | x <- xs, y <- f x]
Alternatively, you could try the following method of working it out from scratch from the type:
(>>=) :: [a] -> (a -> [b]) -> [b]
We need to handle two cases:
[] >>= f = ???
(x:xs) >>= f = ???
The first is easy. We have no elements of type a, so we can't apply f. The only thing we can do is return an empty list:
[] >>= f = []
For the second, x is a value of type a, so we can apply f giving us a value of f x of type [b]. That's the beginning of our list, and we can concatenate it with the rest of the list generated by a recursive call:
(x:xs) >>= f = f x ++ (xs >>= f)
I need a function that does this:
>>> func (+1) [1,2,3]
[[2,2,3],[2,3,3],[2,3,4]]
My real case is more complex, but this example shows the gist of the problem. The main difference is that in reality using indexes would be infeasible. The List should be a Traversable or Foldable.
EDIT: This should be the signature of the function:
func :: Traversable t => (a -> a) -> t a -> [t a]
And closer to what I really want is the same signature to traverse but can't figure out the function I have to use, to get the desired result.
func :: (Traversable t, Applicative f) :: (a -> f a) -> t a -> f (t a)
It looks like #Benjamin Hodgson misread your question and thought you wanted f applied to a single element in each partial result. Because of this, you've ended up thinking his approach doesn't apply to your problem, but I think it does. Consider the following variation:
import Control.Monad.State
indexed :: (Traversable t) => t a -> (t (Int, a), Int)
indexed t = runState (traverse addIndex t) 0
where addIndex x = state (\k -> ((k, x), k+1))
scanMap :: (Traversable t) => (a -> a) -> t a -> [t a]
scanMap f t =
let (ti, n) = indexed (fmap (\x -> (x, f x)) t)
partial i = fmap (\(k, (x, y)) -> if k < i then y else x) ti
in map partial [1..n]
Here, indexed operates in the state monad to add an incrementing index to elements of a traversable object (and gets the length "for free", whatever that means):
> indexed ['a','b','c']
([(0,'a'),(1,'b'),(2,'c')],3)
and, again, as Ben pointed out, it could also be written using mapAccumL:
indexed = swap . mapAccumL (\k x -> (k+1, (k, x))) 0
Then, scanMap takes the traversable object, fmaps it to a similar structure of before/after pairs, uses indexed to index it, and applies a sequence of partial functions, where partial i selects "afters" for the first i elements and "befores" for the rest.
> scanMap (*2) [1,2,3]
[[2,2,3],[2,4,3],[2,4,6]]
As for generalizing this from lists to something else, I can't figure out exactly what you're trying to do with your second signature:
func :: (Traversable t, Applicative f) => (a -> f a) -> t a -> f (t a)
because if you specialize this to a list you get:
func' :: (Traversable t) => (a -> [a]) -> t a -> [t a]
and it's not at all clear what you'd want this to do here.
On lists, I'd use the following. Feel free to discard the first element, if not wanted.
> let mymap f [] = [[]] ; mymap f ys#(x:xs) = ys : map (f x:) (mymap f xs)
> mymap (+1) [1,2,3]
[[1,2,3],[2,2,3],[2,3,3],[2,3,4]]
This can also work on Foldable, of course, after one uses toList to convert the foldable to a list. One might still want a better implementation that would avoid that step, though, especially if we want to preserve the original foldable type, and not just obtain a list.
I just called it func, per your question, because I couldn't think of a better name.
import Control.Monad.State
func f t = [evalState (traverse update t) n | n <- [0..length t - 1]]
where update x = do
n <- get
let y = if n == 0 then f x else x
put (n-1)
return y
The idea is that update counts down from n, and when it reaches 0 we apply f. We keep n in the state monad so that traverse can plumb n through as you walk across the traversable.
ghci> func (+1) [1,1,1]
[[2,1,1],[1,2,1],[1,1,2]]
You could probably save a few keystrokes using mapAccumL, a HOF which captures the pattern of traversing in the state monad.
This sounds a little like a zipper without a focus; maybe something like this:
data Zippy a b = Zippy { accum :: [b] -> [b], rest :: [a] }
mapZippy :: (a -> b) -> [a] -> [Zippy a b]
mapZippy f = go id where
go a [] = []
go a (x:xs) = Zippy b xs : go b xs where
b = a . (f x :)
instance (Show a, Show b) => Show (Zippy a b) where
show (Zippy xs ys) = show (xs [], ys)
mapZippy succ [1,2,3]
-- [([2],[2,3]),([2,3],[3]),([2,3,4],[])]
(using difference lists here for efficiency's sake)
To convert to a fold looks a little like a paramorphism:
para :: (a -> [a] -> b -> b) -> b -> [a] -> b
para f b [] = b
para f b (x:xs) = f x xs (para f b xs)
mapZippy :: (a -> b) -> [a] -> [Zippy a b]
mapZippy f xs = para g (const []) xs id where
g e zs r d = Zippy nd zs : r nd where
nd = d . (f e:)
For arbitrary traversals, there's a cool time-travelling state transformer called Tardis that lets you pass state forwards and backwards:
mapZippy :: Traversable t => (a -> b) -> t a -> t (Zippy a b)
mapZippy f = flip evalTardis ([],id) . traverse g where
g x = do
modifyBackwards (x:)
modifyForwards (. (f x:))
Zippy <$> getPast <*> getFuture
In Haskell, I recently found the following function useful:
listCase :: (a -> [a] -> b) -> [a] -> [b]
listCase f [] = []
listCase f (x:xs) = f x xs : listCase f xs
I used it to generate sliding windows of size 3 from a list, like this:
*Main> listCase (\_ -> take 3) [1..5]
[[2,3,4],[3,4,5],[4,5],[5],[]]
Is there a more general recursion scheme which captures this pattern? More specifically, that allows you to generate a some structure of results by repeatedly breaking data into a "head" and "tail"?
What you are asking for is a comonad. This may sound scarier than monad, but is a simpler concept (YMMV).
Comonads are Functors with additional structure:
class Functor w => Comonad w where
extract :: w a -> a
duplicate :: w a -> w (w a)
extend :: (w a -> b) -> w a -> w b
(extendand duplicate can be defined in terms of each other)
and laws similar to the monad laws:
duplicate . extract = id
duplicate . fmap extract = id
duplicate . duplicate = fmap duplicate . duplicate
Specifically, the signature (a -> [a] -> b) takes non-empty Lists of type a. The usual type [a] is not an instance of a comonad, but the non-empty lists are:
data NE a = T a | a :. NE a deriving Functor
instance Comonad NE where
extract (T x) = x
extract (x :. _) = x
duplicate z#(T _) = T z
duplicate z#(_ :. xs) = z :. duplicate xs
The comonad laws allow only this instance for non-empty lists (actually a second one).
Your function then becomes
extend (take 3 . drop 1 . toList)
Where toList :: NE a -> [a] is obvious.
This is worse than the original, but extend can be written as =>> which is simpler if applied repeatedly.
For further information, you may start at What is the Comonad typeclass in Haskell?.
This looks like a special case of a (jargon here but it can help with googling) paramorphism, a generalisation of primitive recursion to all initial algebras.
Reimplementing ListCase
Let's have a look at how to reimplement your function using such a combinator. First we define the notion of paramorphism: a recursion principle where not only the result of the recursive call is available but also the entire substructure this call was performed on:
The type of paraList tells me that in the (:) case, I will have access to the head, the tail and the value of the recursive call on the tail and that I need to provide a value for the base case.
module ListCase where
paraList :: (a -> [a] -> b -> b) -- cons
-> b -- nil
-> [a] -> b -- resulting function on lists
paraList c n [] = n
paraList c n (x : xs) = c x xs $ paraList c n xs
We can now give an alternative definition of listCase:
listCase' :: (a -> [a] -> b) -> [a] -> [b]
listCase' c = paraList (\ x xs tl -> c x xs : tl) []
Considering the general case
In the general case, we are interested in building a definition of paramorphism for all data structures defined as the fixpoint of a (strictly positive) functor. We use the traditional fixpoint operator:
newtype Fix f = Fix { unFix :: f (Fix f) }
This builds an inductive structure layer by layer. The layers have an f shape which maybe better grasped by recalling the definition of List using this formalism. A layer is either Nothing (we're done!) or Just (head, tail):
newtype ListF a as = ListF { unListF :: Maybe (a, as) }
type List a = Fix (ListF a)
nil :: List a
nil = Fix $ ListF $ Nothing
cons :: a -> List a -> List a
cons = curry $ Fix . ListF .Just
Now that we have this general framework, we can define para generically for all Fix f where f is a functor:
para :: Functor f => (f (Fix f, b) -> b) -> Fix f -> b
para alg = alg . fmap (\ rec -> (rec, para alg rec)) . unFix
Of course, ListF a is a functor. Meaning we could use para to reimplement paraList and listCase.
instance Functor (ListF a) where fmap f = ListF . fmap (fmap f) . unListF
paraList' :: (a -> List a -> b -> b) -> b -> List a -> b
paraList' c n = para $ maybe n (\ (a, (as, b)) -> c a as b) . unListF
listCase'' :: (a -> List a -> b) -> List a -> List b
listCase'' c = paraList' (\ x xs tl -> cons (c x xs) tl) nil
You can implement a simple bijection toList, fromList to test it if you want. I could not be bothered to reimplement take so it's pretty ugly:
toList :: [a] -> List a
toList = foldr cons nil
fromList :: List a -> [a]
fromList = paraList' (\ x _ tl -> x : tl) []
*ListCase> fmap fromList . fromList . listCase'' (\ _ as -> toList $ take 3 $ fromList as). toList $ [1..5]
[[2,3,4],[3,4,5],[4,5],[5],[]]
Perhaps this is obvious, but I can't seem to figure out how to best filter an infinite list of IO values. Here is a simplified example:
infinitelist :: [IO Int]
predicate :: (a -> Bool)
-- how to implement this?
mysteryFilter :: (a -> Bool) -> [IO a] -> IO [a]
-- or perhaps even this?
mysteryFilter' :: (a -> Bool) -> [IO a] -> [IO a]
Perhaps I have to use sequence in some way, but I want the evaluation to be lazy. Any suggestions? The essence is that for each IO Int in the output we might have to check several IO Int values in the input.
Thank you!
Not doable without using unsafeInterleaveIO or something like it. You can't write a filter with the second type signature, since if you could you could say
unsafePerformIOBool :: IO Bool -> Bool
unsafePerformIOBool m = case mysteryFilter' id [m] of
[] -> False
(_:_) -> True
Similarly, the first type signature isn't going to work--any recursive call will give you back something of type IO [a], but then to build a list out of this you will need to perform this action before returning a result (since : is not in IO you need to use >>=). By induction you will have to perform all the actions in the list (which takes forever when the list is infinitely long) before you can return a result.
unsafeInterleaveIO resolves this, but is unsafe.
mysteryFilter f [] = return []
mysteryFilter f (x:xs) = do ys <- unsafeInterleaveIO $ mysteryFilter f xs
y <- x
if f y then return (y:ys) else return ys
the problem is that this breaks the sequence that the monad is supposed to provide. You no longer have guarantees about when your monadic actions happen (they might never happen, they might happen multiple times, etc).
Lists just do not play nice with IO. This is why we have the plethora of streaming types (Iteratees, Conduits, Pipes, etc).
The simplest such type is probably
data MList m a = Nil | Cons a (m (MList m a))
note that we observe that
[a] == MList Id a
since
toMList :: [a] -> MList Id a
toMList [] = Nil
toMList (x:xs) = Cons x $ return $ toMList xs
fromMList :: MList Id a -> [a]
fromMList Nil = []
fromMList (Cons x xs) = x:(fromMList . runId $ xs)
also, MList is a functor
instance Functor m => Functor (MList m) where
fmap f Nil = Nil
fmap f (Cons x xs) = Cons (f x) (fmap (fmap f) xs)
and it is a functor in the category of Functor's and Natural transformations.
trans :: Functor m => (forall x. m x -> n x) -> MList m a -> MList n a
trans f Nil = Nil
trans f (Cons x xs) = Cons x (f (fmap trans f xs))
with this it is easy to write what you want
mysteryFilter :: (a -> Bool) -> MList IO (IO a) -> IO (MList IO a)
mysteryFilter f Nil = return Nil
mysteryFilter f (Cons x xs)
= do y <- x
let ys = liftM (mysteryFilter f) xs
if f y then Cons y ys else ys
or various other similar functions.