I'm calculating the sum of a list after applying someFunction to every element of it like so:
sum (map someFunction myList)
someFunction is very resource heavy so to optimise it I want to stop calculating the sum if it goes above a certain threshold.
It seems like I need to use fold but I don't know how to break out if it if the accumulator reaches the threshold. My guess is to somehow compose fold and takeWhile but I'm not exactly sure how.
Another technique is to use a foldM with Either to capture the early termination effect. Left signals early termination.
import Control.Monad(foldM)
sumSome :: (Num n,Ord n) => n -> [n] -> Either n n
sumSome thresh = foldM f 0
where
f a n
| a >= thresh = Left a
| otherwise = Right (a+n)
To ignore the exit status, just compose with either id id.
sumSome' :: (Num n,Ord n) => n -> [n] -> n
sumSome' n = either id id . sumSome n
One of the options would be using scanl function, which returns a list of intermediate calculations of foldl.
Thus, scanl1 (+) (map someFunction myList) will return the intermediate sums of your calculations. And since Haskell is a lazy language it won't calculate all the values of myList until you need it. For example:
take 5 $ scanl1 (+) (map someFunction myList)
will calculate someFunction 5 times and return the list of these 5 results.
After that you can use either takeWhile or dropWhile and stop the calculation, when a certain condition is True. For example:
head $ dropWhile (< 1000) $ scanl1 (+) [1..1000000000]
will stop the calculation, when sum of the numbers reaches 1000 and returns 1035.
This will do what you ask about without building the intermediate list as scanl' would (and scanl would even cause a thunks build-up on top of that):
foldl'Breaking break reduced reducer acc list =
foldr cons (\acc -> acc) list acc
where
cons x r acc | break acc x = reduced acc x
| otherwise = r $! reducer acc x
cf. related wiki page.
Use a bounded addition operator instead of (+) with foldl.
foldl (\b a -> b + if b > someThreshold then 0 else a) 0 (map someFunction myList)
Because Haskell is non-strict, only calls to someFunction that are necessary to evaluate the if-then-else are themselves evaluated. fold still traverses the entire list.
> foldl (\b a -> b + if b > 10 then 0 else a) 0 (map (trace "foo") [1..20])
foo
foo
foo
foo
foo
15
sum [1..5] > 10, and you can see that trace "foo" only executes 5 times, not 20.
Instead of foldl, though, you should use the strict version foldl' from Data.Foldable.
You could try making your own sum function, maybe call it boundedSum that takes
an Integer upper bound
an [Integer] to sum over
a "sum up until this point" value to be compared with the upper bound
and returns the sum of the list.
boundedSum :: Integer -> [Integer] -> Integer -> Integer
boundedSum upperBound (x : xs) prevSum =
let currentSum = prevSum + x
in
if currentSum > upperBound
then upperBound
else boundedSum upperBound xs currentSum
boundedSum upperBound [] prevSum =
prevSum
I think this way you won't "eat up" more of the list if the sum up until the current element exceeds upperBound.
EDIT: The answers to this question suggest better techniques than mine and the question itself looks rather similar to yours.
This is a possible solution:
last . takeWhile (<=100) . scanl (+) 0 . map (^2) $ [1..]
Dissected:
take your starting list ([1..] in the example)
map your expensive function ((^2))
compute partial sums scanl (+) 0
stop after the partial sums become too large (keep those (<=100))
take the last one
If performance matters, also try scanl', which might improve it.
Something like this using until :: (a -> Bool) -> (a -> a) -> a -> a from the Prelude
sumUntil :: Real a => a -> [a] -> a
sumUntil threshold u = result
where
(_, result) = until stopCondition next (u, 0)
next :: Real a => ([a], a) -> ([a], a)
next ((x:xs), y) = (xs, x + y)
stopCondition :: Real a => ([a], a) -> Bool
stopCondition (ls, x) = null ls || x > threshold
Then apply
sumUntil 10 (map someFunction myList)
This post is already a bit older but I'd like to mention a way to generalize the nice code of #trevor-cook above to break fold with the additional possibility to return not only a default value or the accumulator but also the index and element of the list where the breaking condition was satisfied:
import Control.Monad (foldM)
breakFold step initialValue list exitCondition exitFunction =
either id (exitFunction (length list) (last list))
(foldM f initialValue (zip [0..] list))
where f acc (index,x)
| exitCondition index x acc
= Left (exitFunction index x acc)
| otherwise = Right (step index x acc)
It also only requires to import foldM. Examples for the usage are:
mysum thresh list = breakFold (\i x acc -> x + acc) 0 list
(\i x acc -> x + acc > thresh)
(\i x acc -> acc)
myprod thresh list = breakFold (\i x acc -> x * acc) 1 list
(\i x acc -> acc == thresh)
(\i x acc -> (i,x,acc))
returning
*myFile> mysum 42 [1,1..]
42
*myFile> myprod 0 ([1..5]++[0,0..])
(6,0,0)
*myFile> myprod 0 (map (\n->1/n) [1..])
(178,5.58659217877095e-3,0.0)
In this way, one can use the index and the last evaluated list value as input for further functions.
Despite the age of this post, I'll add a possible solution. I like continuations because I find them very useful in terms of flow control.
breakableFoldl
:: (b -> a -> (b -> r) -> (b -> r) -> r)
-> b
-> [a]
-> (b -> r)
-> r
breakableFoldl f b (x : xs) = \ exit ->
f b x exit $ \ acc ->
breakableFoldl f acc xs exit
breakableFoldl _ b _ = ($ b)
breakableFoldr
:: (a -> b -> (b -> r) -> (b -> r) -> r)
-> b
-> [a]
-> (b -> r)
-> r
breakableFoldr f b l = \ exit ->
fix (\ fold acc xs next ->
case xs of
x : xs' -> fold acc xs' (\ acc' -> f x acc' exit next)
_ -> next acc) b l exit
exampleL = breakableFoldl (\ acc x exit next ->
( if acc > 15
then exit
else next . (x +)
) acc
) 0 [1..9] print
exampleR = breakableFoldr (\ x acc exit next ->
( if acc > 15
then exit
else next . (x +)
) acc
) 0 [1..9] print
Using lists to model nondeterminism is problematic if the inputs can take infinitely many values. For example
pairs = [ (a,b) | a <- [0..], b <- [0..] ]
This will return [(0,1),(0,2),(0,3),...] and never get around to showing you any pair whose first element is not 0.
Using the Cantor pairing function to collapse a list of lists into a single list can get around this problem. For example, we can define a bind-like operator that orders its outputs more intelligently by
(>>>=) :: [a] -> (a -> [b]) -> [b]
as >>>= f = cantor (map f as)
cantor :: [[a]] -> [a]
cantor xs = go 1 xs
where
go _ [] = []
go n xs = hs ++ go (n+1) ts
where
ys = filter (not.null) xs
hs = take n $ map head ys
ts = mapN n tail ys
mapN :: Int -> (a -> a) -> [a] -> [a]
mapN _ _ [] = []
mapN n f xs#(h:t)
| n <= 0 = xs
| otherwise = f h : mapN (n-1) f t
If we now wrap this up as a monad, we can enumerate all possible pairs
newtype Select a = Select { runSelect :: [a] }
instance Monad Select where
return a = Select [a]
Select as >>= f = Select $ as >>>= (runSelect . f)
pairs = runSelect $ do
a <- Select [0..]
b <- Select [0..]
return (a,b)
This results in
>> take 15 pairs
[(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0),(0,4),(1,3),(2,2),(3,1),(4,0)]
which is a much more desirable result. However, if we were to ask for triples instead, the ordering on the outputs isn't as "nice" and it's not even clear to me that all outputs are eventually included --
>> take 15 triples
[(0,0,0),(0,0,1),(1,0,0),(0,1,0),(1,0,1),(2,0,0),(0,0,2),(1,1,0),(2,0,1),(3,0,0),(0,1,1),(1,0,2),(2,1,0),(3,0,1),(4,0,0)]
Note that (2,0,1) appears before (0,1,1) in the ordering -- my intuition says that a good solution to this problem will order the outputs according to some notion of "size", which could be an explicit input to the algorithm, or could be given implicitly (as in this example, where the "size" of an input is its position in the input lists). When combining inputs, the "size" of a combination should be some function (probably the sum) of the size of the inputs.
Is there an elegant solution to this problem that I am missing?
TL;DR: It flattens two dimensions at a time, rather than flattening three at once. You can't tidy this up in the monad because >>= is binary, not ternary etc.
I'll assume you defined
(>>>=) :: [a] -> (a -> [b]) -> [b]
as >>>= f = cantor $ map f as
to interleave the list of lists.
You like that because it goes diagonally:
sums = runSelect $ do
a <- Select [0..]
b <- Select [0..]
return (a+b)
gives
ghci> take 36 sums
[0,1,1,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7]
so it's pleasingly keeping the "sizes" in order, but the pattern appears to be broken for triples, and you doubt completeness, but you needn't. It's doing the same trick, but twice, rather than for all three at once:
triplePairs = runSelect $ do
a <- Select [0..]
b <- Select [0..]
c <- Select [0..]
return $ (a,(b,c))
The second pair is treated as a single source of data, so notice that:
ghci> map fst $ take 36 pairs
[0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5,0,1,2,3,4,5,6,0,1,2,3,4,5,6,7]
ghci> map fst $ take 36 triplePairs
[0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5,0,1,2,3,4,5,6,0,1,2,3,4,5,6,7]
and (adding some spaces/newlines for clarity of pattern):
ghci> map snd $ take 36 pairs
[0, 1,0, 2,1,0, 3,2,1,0, 4,3,2,1,0, 5,4,3,2,1,0, 6,5,4,3,2,1,0, 7,6,5,4,3,2,1,0]
ghci> map snd $ take 36 triplePairs
[(0,0), (0,1),(0,0), (1,0),(0,1),(0,0), (0,2),(1,0),(0,1),(0,0),
(1,1),(0,2),(1,0),(0,1),(0,0),
(2,0),(1,1),(0,2),(1,0),(0,1),(0,0),
(0,3),(2,0),(1,1),(0,2),(1,0),(0,1),(0,0),
(1,2),(0,3),(2,0),(1,1),(0,2),(1,0),(0,1),(0,0)]
so you can see it's using exactly the same pattern. This doesn't preserve total sums and it oughtn't because we're getting to three dimensions by flattening two dimensions first before flattening the third in. The pattern is obscured, but it's just as guaranteed to make it to the end of the list.
Sadly if you want to do three dimensions in a sum-preserving way, you'll have to write cantor2, cantor3 and cantor4 functions, possibly a cantorN function, but you'll have to ditch the monadic interface, which is inherently based on the bracketing of >>=, hence two-at-a-time flattening of dimensions.
import Control.Applicative
import Control.Arrow
data Select a = Select [a]
| Selects [Select a]
instance Functor Select where
fmap f (Select x) = Select $ map f x
fmap f (Selects xss) = Selects $ map (fmap f) xss
instance Applicative Select where
pure = Select . (:[])
Select fs <*> xs = Selects $ map (`fmap`xs) fs
Selects fs <*> xs = Selects $ map (<*>xs) fs
instance Monad Select where
return = pure
Select xs >>= f = Selects $ map f xs
Selects xs >>= f = Selects $ map (>>=f) xs
runSelect :: Select a -> [a]
runSelect = go 1
where go n xs = uncurry (++) . second (go $ n+1) $ splitOff n xs
splitOff n (Select xs) = second Select $ splitAt n xs
splitOff n (Selects sls) = (concat hs, Selects $ tsl ++ rl)
where ((hs, tsl), rl) = first (unzip . map (splitOff n)) $ splitAt n sls
*Select> take 15 . runSelect $ do { a<-Select [0..]; b<-Select [0..]; return (a,b) }
[(0,0),(0,1),(1,0),(1,1),(0,2),(1,2),(2,0),(2,1),(2,2),(0,3),(1,3),(2,3),(3,0),(3,1),(3,2)]
*Select> take 15 . runSelect $ do { a<-Select [0..]; b<-Select [0..]; c<-Select [0..]; return (a,b,c) }
[(0,0,0),(0,0,1),(0,1,0),(0,1,1),(1,0,0),(1,0,1),(1,1,0),(1,1,1),(0,0,2),(0,1,2),(0,2,0),(0,2,1),(0,2,2),(1,0,2),(1,1,2)]
Note that this is still not quite Cantor-tuples ((0,1,1) shouldn't come before (1,0,0)), but getting it correct would be possible as well in a similar manner.
A correct multidimentional enumerator could be represented with a temporary state object
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE OverlappingInstances #-}
class Space a b where
slice :: a -> ([b], a)
instance Space [a] a where
slice (l:ls) = ([l], ls)
slice [] = ([], [])
instance (Space sp x) => Space ([sp], [sp]) x where
slice (fs, b:bs) = let
ss = map slice (b : fs)
yield = concat $ map fst ss
in (yield, (map snd ss, bs))
Here an N dimensional space is represented by a tuple of lists of N-1 dimensional subspaces that have and haven't been touched by the enumeration.
You can then use the following to produce a well ordered list
enumerate :: (Space sp x) => sp -> [x]
enumerate sp = let (sl, sp') = slice sp
in sl ++ enumerate sp'
Example in Ideone.
The omega package does exactly what you want and guarantees that every element will be eventually visited:
import Control.Applicative
import Control.Monad.Omega
main = print . take 200 . runOmega $
(,,) <$> each [0..] <*> each [0..] <*> each [0..]
Another option would be to use LogicT. It gives more flexibility (if you need) and has operations such as (>>-) that ensure that every combination is eventually encountered.
import Control.Applicative
import Control.Monad
import Control.Monad.Logic
-- | Convert a list into any MonadPlus.
each :: (MonadPlus m) => [a] -> m a
each = msum . map return
-- | A fair variant of '(<*>)` that ensures that both branches are explored.
(<#>) :: (MonadLogic m) => m (a -> b) -> m a -> m b
(<#>) f k = f >>- (\f' -> k >>- (\k' -> return $ f' k'))
infixl 4 <#>
main = print . observeMany 200 $
(,,) <$> each [0..] <#> each [0..] <#> each [0..]