How can this haskell rolling sum implementation be improved? - haskell

How can I improve the the following rolling sum implementation?
type Buffer = State BufferState (Maybe Double)
type BufferState = ( [Double] , Int, Int )
-- circular buffer
buff :: Double -> Buffer
buff newVal = do
( list, ptr, len) <- get
-- if the list is not full yet just accumulate the new value
if length list < len
then do
put ( newVal : list , ptr, len)
return Nothing
else do
let nptr = (ptr - 1) `mod` len
(as,(v:bs)) = splitAt ptr list
nlist = as ++ (newVal : bs)
put (nlist, nptr, len)
return $ Just v
-- create intial state for circular buffer
initBuff l = ( [] , l-1 , l)
-- use the circular buffer to calculate a rolling sum
rollSum :: Double -> State (Double,BufferState) (Maybe Double)
rollSum newVal = do
(acc,bState) <- get
let (lv , bState' ) = runState (buff newVal) bState
acc' = acc + newVal
-- subtract the old value if the circular buffer is full
case lv of
Just x -> put ( acc' - x , bState') >> (return $ Just (acc' - x))
Nothing -> put ( acc' , bState') >> return Nothing
test :: (Double,BufferState) -> [Double] -> [Maybe Double] -> [Maybe Double]
test state [] acc = acc
test state (x:xs) acc =
let (a,s) = runState (rollSum x) state
in test s xs (a:acc)
main :: IO()
main = print $ test (0,initBuff 3) [1,1,1,2,2,0] []
Buffer uses the State monad to implement a circular buffer. rollSum uses the State monad again to keep track of the rolling sum value and the state of the circular buffer.
How could I make this more elegant?
I'd like to implement other functions like rolling average or a difference, what could I do to make this easy?
Thanks!
EDIT
I forgot to mention I am using a circular buffer as I intend to use this code on-line and process updates as they arrive - hence the need to record state. Something like
newRollingSum = update rollingSum newValue

I haven't managed to decipher all of your code, but here is the plan I would take for solving this problem. First, an English description of the plan:
We need windows into the list of length n starting at each index.
Make windows of arbitrary length.
Truncate long windows to length n.
Drop the last n-1 of these, which will be too short.
For each window, add up the entries.
This was the first idea I had; for windows of length three it's an okay approach because step 2 is cheap on such a short list. For longer windows, you may want an alternate approach, which I will discuss below; but this approach has the benefit that it generalizes smoothly to functions other than sum. The code might look like this:
import Data.List
rollingSums n xs
= map sum -- add up the entries
. zipWith (flip const) (drop (n-1) xs) -- drop the last n-1
. map (take n) -- truncate long windows
. tails -- make arbitrarily long windows
$ xs
If you're familiar with the "equational reasoning" approach to optimization, you might spot a first place we can improve the performance of this function: by swapping the first map and zipWith, we can produce a function with the same behavior but with a map f . map g subterm, which can be replaced by map (f . g) to get slightly less allocation.
Unfortunately, for large n, this adds n numbers together in the inner loop; we would prefer to simply add the value at the "front" of the window and subtract the one at the "back". So we need to get trickier. Here's a new idea: we'll traverse the list twice in parallel, n positions apart. Then we'll use a simple function for getting the rolling sum (of unbounded window length) of prefixes of a list, namely, scanl (+), to convert this traversal into the actual sums we're interested in.
rollingSumsEfficient n xs = scanl (+) firstSum deltas where
firstSum = sum (take n xs)
deltas = zipWith (-) (drop n xs) xs -- front - back
There's one twist, which is that scanl never returns an empty list. So if it's important that you be able to handle short lists, you'll want another equation that checks for these. Don't use length, as that forces the entire input list into memory before starting the computation -- a potentially lethal performance mistake. Instead add a line like this above the previous definition:
rollingSumsEfficient n xs | null (drop (n-1) xs) = []
We can try these two out in ghci. You'll notice that they do not quite have the same behavior as yours:
*Main> rollingSums 3 [10^n | n <- [0..5]]
[111,1110,11100,111000]
*Main> rollingSumsEfficient 3 [10^n | n <- [0..5]]
[111,1110,11100,111000]
On the other hand, the implementations are considerably more concise and are fully lazy in the sense that they work on infinite lists:
*Main> take 5 . rollingSums 10 $ [1..]
[55,65,75,85,95]
*Main> take 5 . rollingSumsEfficient 10 $ [1..]
[55,65,75,85,95]

Efficient implementation for rolling sum in haskell-
rollingSums :: Num a => Int -> [a] -> Maybe [a]
rollingSums n xs | n <= 0 = Nothing
| otherwise = Just $ if length as == n then go (sum as) xs bs else []
where
(as, bs) = splitAt n xs
go s xs [] = [s]
go s xs (y:ys) = s : go (s + y - head xs) (tail xs) ys
Asuming that - sum((i+1)...(i+1+n)) = sum(i..(i+n)) - arr[i] + arr[i+n+1]

Related

Are recursive calls in my "permutations with repetition" code accumulated to clog the RAM?

A bit of background:
I am an amateur programmer, having picked up Haskell a few months ago, on my spare time, after a period of Mathematica programmning (my first language). I am currently going through my second Haskell book, by Will Kurt, but I still have miles to go to call myself comfortable around Haskell code. Codeabbey has been my platform for experimentation and learning so far.
I have written a piece of code to generate permutations of a given number, that deals with possible duplicate numbers, so for 588 it will internally generate 588, 858 and 885.
However, because I want to scale to pretty big input numbers (think perhaps even a hundred digits long), I don't want to output the whole list and then perform calculations on it, instead every number that is generated is checked on the spot for a certain property and if it has it, well, we have a winner, the number is returned as output and there's no need to go through the rest of the humongous list. If sadly no desired number is found and we unsuccessfully go through all possible permutations, it outputs a "0".
I have also opted to make it a command line program to feed values to it via gnu parallel for faster work.
So here is the code
import System.Environment
import Data.List
toDigits :: Integer -> [Integer]
toDigits n = map (\n -> read [n]) (show n)
fromDigits :: Integral a => [a] -> Integer
fromDigits list = fromDigitsHelperFunction list 0
fromDigitsHelperFunction :: Integral a => [a] -> Integer -> Integer
fromDigitsHelperFunction [] acc = acc
fromDigitsHelperFunction (x:[]) acc = (fromIntegral x) + acc
fromDigitsHelperFunction digits#(x:xs) acc = fromDigitsHelperFunction xs (acc + ((fromIntegral x) * 10 ^((length digits) - 1 )))
testPermutationsWithRepetition :: ([Integer],Int,[Int],[(Int,Integer)]) -> [Integer]
testPermutationsWithRepetition (digits, index, rotationMap, registeredPositions)
| index == 0 && rotationMap !! index == 0 = [0,0,0] --finish state (no more recursion). Nothing more to do
| index == digitsLength - 1 && beautyCheck (fromDigits digits) = digits
| index == digitsLength - 1 = testPermutationsWithRepetition (digits, index-1, rotationMap, registeredPositions)
| not ((index,digits!!index) `elem` registeredPositions) = testPermutationsWithRepetition (digits, index+1, rotationMap, (index,digits!!index):registeredPositions)
| rotationMap!!index == 0 = testPermutationsWithRepetition (digits, index-1, restoredRotMap, restoredRegPositions)
| rotationMap!!index > 0 && (index,digits!!index) `elem` registeredPositions = testPermutationsWithRepetition (shiftLDigits, index, subtractRot, registeredPositions)
where digitsLength = length digits
shiftLDigits = (fst splitDigits) ++ (tail $ snd splitDigits) ++ [head $ snd splitDigits]
splitDigits = splitAt index digits
restoredRotMap = (fst splitRotMap) ++ [digitsLength - index] ++ (tail $ snd splitRotMap)
splitRotMap = splitAt index rotationMap
restoredRegPositions = filter (\pos -> fst pos < index) registeredPositions --clear everything below the parent index
subtractRot = (fst splitRotMap) ++ [(head $ snd splitRotMap) - 1] ++ (tail $ snd splitRotMap)
--Frontend function for testing permutations by inputting a single parameter (the number in digit form)
testPermsWithRep :: [Integer] -> [Integer]
testPermsWithRep digits = testPermutationsWithRepetition (digits, 0, [length $ digits, (length $ digits) -1 .. 1], [])
main :: IO ()
main = do
args <- getArgs
let number = read (head args) :: Integer
let checkResult = fromDigits $ testPermsWithRep $ toDigits number
print checkResult
It's really a sequential process with an index variable that points to a certain number on the digit list and performs a recursive call on that list based on my rules. The functions tracks its progress through the digit list for visited numbers in certain positions so far (to avoid repetition following already visited paths until it gets to the last digit (index == length -1). If the number that we get there passes the beauty check, it exits with the number produced.
Now, in a Mathematica (or I guess any imperative language) I would probably implement this with a While loop and Cases for its checks, and by the logic of the program, however long it took to compute (generate the permutations and check them for validity) it would take a moderate amount of memory, just enough to hold the list of "registeredPositions" really (you could call it the record of visited digits in specific positions, so it's a variable list as we go deeper in index but gets cleaned up as we move back up). However in this case, the recursive calls stack up as it seems and the whole thing acts as a fork bomb for sufficiently large numbers (e.g 27777772222222222222222223333) and eventually crashes. Is this behaviour something that can be handled differently in Haskell or is there no way to avoid the recursion and memory hogging?
I really like Haskell because the programs make logical sense, but I would like to use it also for cases like this where performance (and resources) matters.
As a side note, my brother pointed to this Algorithm to print all permutations with repetition of numbers in C that is reasonably fast (only generates a list though) and most importantly has minimal memory footprint, although I can tell there's also recursion used in it. Other that that I'm clueless when it comes to C and I would like to stick to Haskell, if it can do what I want at the end of the day, that is.
Any help is welcome. Have a good day!
Edit:
Per Soleil's suggestion I update my post with additional info provided in the comments. Specifically:
After compiling with "ghc checking_program.hs" I run the program with "./checking program 27777772222222222222222223333". On an i5 3470 with 4GB RAM it runs for about 10 minutes and exits with a segmentation fault. On my brothers 32GB machine he let it run until it took up 20GB of RAM. No need to go further I guess. My tests were on Ubuntu via Win10 WSL. His is bare Linux
testPermsWithRep is just a front end for testPermutationsWithRepetition, so that I can only provide the number and testPermsWithRep creates the initial parameters and calls testPermutationsWithRepetition with those. It outputs exactly what testPermutationsWithRepetition outputs, either a number (in digit form) that passes the test, or [0,0,0]. Now the test, the beautyCheck function is simply a test for single digit divisors of that number, that returns True or False. I didn't include it because it really is inconsequential. It could even be just a "bigger than x number" test.
An an example, calling "testPermsWithRep [2,6,7,3]" will call "testPermutationsWithRepetition ([2,6,7,3], 0, [4,3,2,1],[])" and whatever comes out of that function, testPermsWithRep will return that as well.
The performance issue with your program doesn't have anything to do with recursion. Rather, you seem to be running up against an accumulation of a partially evaluated, lazy data structure in your rotation map. Your program will run in constant memory if you use the deepseq package to fully force evaluation of the restoredRotMap:
-- Install the `deepseq` package and add this import
import Control.DeepSeq
-- And then change this one case
... | rotationMap!!index == 0 = restoredRotMap `deepseq`
testPermutationsWithRepetition (digits, index-1, restoredRotMap, restoredRegPositions)
Compiled with ghc -O2 and using beautyMap _ = False, this runs with a fixed resident memory usage of about 6 megs.
Some other performance targets:
You might want to replace most of your Integer types with Int, as this will be faster. I think you only need Integer for the input to toDigits and the output of fromDigits, and everything else can be Int, since it's all indexes and digits.
An even bigger win will be to replace your rotation map and registered positions with better data structures. If you find yourself splicing up lists with lots of listpart1 ++ [x] ++ listpart2 calls, there are going to be enormous performance costs to that, and the linear lookups with (!!) aren't helping either.
So I am not 100% sure of this and I am also not 100% sure I understand your code.
But as far as I understand you are generating permutations without duplicates and then you are checking for some predicate wanting whatever single number that fulfils it.
I think it should help to use as many of the prelude functions as possible because afaik then the compiler understands it can optimize recursion into a loop. As a rule of thumb I was taught to avoid explicit recursion as much as possible and instead use prelude functions like map, filter and fold. Mainly you avoid reinventing the wheel this way but there also should be a higher chance of the compiler optimizing things.
So to solve your problem try generating a list of all permutations, then filter it using filter and then just do take 1 if you want the result that is found first. Because of Haskell's lazy evaluation take 1 makes it so that we are interested only in the first x in (x:xs) that a filter would return. Therefore filter will keep dropping elements from the, again lazily evaluated, list of permutations and when it finds one it stops.
I found a permutation implementation on https://rosettacode.org/wiki/Permutations#Haskell
and used it to try this call:
take 1 $ filter ((> 67890123456789012345) . fromDigits) $ permutations' $ toDigits 12345678901234567890
it has been running for like 20 minutes now and RAM usage has stayed around 230 MB.
I hope that has answered/helped you at least in some way.
+ a bonus tip: you can simplify your fromDigits to this beautiful thing:
fromDigits :: Integral a => [a] -> Integer
fromDigits = foldl shiftAndAdd 0
where shiftAndAdd acc d = 10 * acc + fromIntegral d
EDIT:
I read some more of the comments and I see you care about ignoring duplicates but I am afraid you'll have to go smarter about that, since if I understand correctly your implementation still generates all the duplicates it only throws them away after checking if they are in a list (which has O(n) complexity). And when you only care about finding one permutation that fits your predicate you drop the not fitting ones anyway.
And people have already correctly pointed out that !! is generally also very bad.
Thanks to everyone for your helpful answers and comments.
#lordQuick permuations used with filter is still terrible but that fromDigits code is a beauty, so I used it.
#k-a-buhr That's exactly what I did yesterday, also per others suggestion, I replaced all use of !! and ++. When I did the latter all memory problems disappeared. Wow! I mean I knew ++ is bad I just didn't realise how bad! We're talking orders of magnitude bad! 3M of RAM vs several GB. Also, valid point about integers. I will try that.
Oh, also a very important thing. I replaced recursive calls with until. This is the approach I would have followed in Mathematica (a NestWhile function to be exact), and I'm glad I found it in Haskell. It seemed to make things a bit faster too.
Anyway, the revised code, that solves my memory issues is here for anyone if interested.
{-compiled with "ghc -Rghc-timing -O2 checking_program_v3.hs"-}
import System.Environment
import Data.List
--A little help with triples
fstOfThree (a, _, _) = a
sndOfThree (_, b, _) = b
thrOfThree (_, _, c) = c
--And then some with quads
fstOfFour (a, _, _, _) = a
sndOfFour (_, b, _, _) = b
thrOfFour (_, _, c, _) = c
--This function is a single pass test for single digit factors
--It will be called as many times as needed by pryForSDFactors
trySingleDigitsFactors :: (Bool, Integer, [Integer]) -> (Bool, Integer, [Integer])
trySingleDigitsFactors (True, n, f) = (True, n, f)
trySingleDigitsFactors (b, n, []) = (b, n, [])
trySingleDigitsFactors (b, n, (f:fs))
| mod n f == 0 = (True, div n f, fs)
| otherwise = trySingleDigitsFactors (False, n, fs)
--This function will take a number and repeatedly divide by single digits till it gets to a single digit if possible
--Then it will return True
pryForSDFactors :: Integer -> Bool
pryForSDFactors n
| sndOfThree sdfTry < 10 = True
| fstOfThree sdfTry == True = pryForSDFactors $ sndOfThree sdfTry
| otherwise = False
where sdfTry = trySingleDigitsFactors (False, n, [7,5,3,2])
toDigits :: Integer -> [Integer]
toDigits n = map (\n -> read [n]) (show n)
fromDigits :: Integral a => [a] -> Integer
fromDigits = foldl shiftAndAdd 0
where shiftAndAdd acc d = 10 * acc + fromIntegral d
replaceElementAtPos :: a -> Int -> [a] -> [a]
replaceElementAtPos newElement pos [] = []
replaceElementAtPos newElement 0 (x:xs) = newElement:xs
replaceElementAtPos newElement pos (x:xs) = x : replaceElementAtPos newElement (pos-1) xs
checkPermutationsStep :: ([Integer],Int,[Int],[(Int,Integer)]) -> ([Integer],Int,[Int],[(Int,Integer)])
checkPermutationsStep (digits, index, rotationMap, registeredPositions)
| index == digitsLength - 1 = (digits, index-1, rotationMap, registeredPositions)
| not ((index, digitAtIndex) `elem` registeredPositions) = (digits, index+1, rotationMap, (index,digitAtIndex):registeredPositions)
| rotationAtIndex == 0 = (digits, index-1, restoredRotMap, restoredRegPositions)
| rotationAtIndex > 0 && (index, digitAtIndex) `elem` registeredPositions = (shiftLDigits, index, subtractRot, registeredPositions)
where digitsLength = length digits
digitAtIndex = head $ drop index digits
rotationAtIndex = head $ drop index rotationMap
--restoredRotMap = (fst splitRotMap) ++ [digitsLength - index] ++ (tail $ snd splitRotMap)
restoredRotMap = replaceElementAtPos (digitsLength - index) index rotationMap
--splitRotMap = splitAt index rotationMap
restoredRegPositions = filter (\pos -> fst pos < index) registeredPositions --clear everything below the parent index
shiftLDigits = (fst splitDigits) ++ (tail $ snd splitDigits) ++ [head $ snd splitDigits]
splitDigits = splitAt index digits
--subtractRot = (fst splitRotMap) ++ [(head $ snd splitRotMap) - 1] ++ (tail $ snd splitRotMap)
subtractRot = replaceElementAtPos (rotationDigitAtIndex - 1) index rotationMap
rotationDigitAtIndex = head $ drop index rotationMap
checkConditions :: ([Integer],Int,[Int],[(Int,Integer)]) -> Bool
checkConditions (digits, index, rotationMap, registeredPositions)
| (index == 0 && rotationAtIndex == 0) || ((index == (length digits) - 1) && pryForSDFactors (fromDigits digits)) = True
| otherwise = False
where rotationAtIndex = head $ drop index rotationMap
testPermsWithRep :: Integer -> Integer
testPermsWithRep n
| sndOfFour computationResult == 0 && (head . thrOfFour) computationResult == 0 = 0
| otherwise = (fromDigits . fstOfFour) computationResult
where computationResult = until checkConditions checkPermutationsStep (digitsOfn, 0 , [digitsLength, digitsLength -1 .. 1], [])
digitsOfn = toDigits n
digitsLength = length digitsOfn
main :: IO ()
main = do
args <- getArgs
let inputNumber = read (head args) :: Integer
let checkResult = testPermsWithRep inputNumber
print checkResult
Now, bear in mind that this code, as I've mentioned, checks for a condition of each generated permutation (single digit factors) on the spot, and moves on if False, but it's pretty easy to repurpose it for output list generation.
Sure it's now just inefficient in terms of big O complexity (scales terribly), and I was at first thinking of replacing lists with Data.Map because that's what I've learned so far (though not so comfortable with maps yet).
I've also read that there's a more efficient replacement for read since that's also called a lot for numbers-to-digits conversions.
# lordQuick I don't know about HashMaps or vectors yet but I'm still learning. Every little optimization will pay off in computation time because this is my first piece of "practical" code, not just Codeabbey credit
Cheers!
Here is a solution using a more efficient, insertion-based algorithm to compute unique permutations:
import Data.List
permutationsNub :: Eq a => [a] -> [[a]]
permutationsNub = foldr (concatMap . insert) [[]]
where insert y = foldr combine [[y]] . (zip <*> tail . tails)
where combine (x, xs) xss = (y : x : xs) :
if y == x then [] else map (x :) xss
headDef :: a -> [a] -> a
headDef x [] = x
headDef x (h : t) = h
fromDigits :: Integral a => [a] -> Integer
fromDigits = foldl1' ((+) . (10 *)) . map fromIntegral
toDigits :: Integer -> [Int]
toDigits = map (read . pure) . show
firstValidPermutation :: (Integer -> Bool) -> Integer -> Integer
firstValidPermutation p =
headDef 0 .
filter p .
map fromDigits .
permutationsNub .
toDigits
The basic idea is that, given the unique permutations of a list's tail, we can compute the unique permutations of the whole list by inserting its head into all of the tail's permutations, in every position that doesn't follow an occurrence of the head (to avoid creating duplicates). From my tests, permutationsNub seems to be faster than permutations from Data.List even when the input contains no repetitions. However, unlike that function, it consumes its input eagerly and thus cannot handle an infinite input. Exercise: Prove this algorithm's correctness.
to be continued

How to create a Infinite List in Haskell where the new value consumes all the previous values

If I create a infinite list like this:
let t xs = xs ++ [sum(xs)]
let xs = [1,2] : map (t) xs
take 10 xs
I will get this result:
[
[1,2],
[1,2,3],
[1,2,3,6],
[1,2,3,6,12],
[1,2,3,6,12,24],
[1,2,3,6,12,24,48],
[1,2,3,6,12,24,48,96],
[1,2,3,6,12,24,48,96,192],
[1,2,3,6,12,24,48,96,192,384],
[1,2,3,6,12,24,48,96,192,384,768]
]
This is pretty close to what I am trying to do.
This current code uses the last value to define the next. But, instead of a list of lists, I would like to know some way to make an infinite list that uses all the previous values to define the new one.
So the output would be only
[1,2,3,6,12,24,48,96,192,384,768,1536,...]
I have the definition of the first element [1].
I have the rule of getting a new element, sum all the previous elements.
But, I could not put this in the Haskell grammar to create the infinite list.
Using my current code, I could take the list that I need, using the command:
xs !! 10
> [1,2,3,6,12,24,48,96,192,384,768,1536]
But, it seems to me, that it is possible doing this in some more efficient way.
Some Notes
I understand that, for this particular example, that was intentionally oversimplified, we could create a function that uses only the last value to define the next.
But, I am searching if it is possible to read all the previous values into an infinite list definition.
I am sorry if the example that I used created some confusion.
Here another example, that is not possible to fix using reading only the last value:
isMultipleByList :: Integer -> [Integer] -> Bool
isMultipleByList _ [] = False
isMultipleByList v (x:xs) = if (mod v x == 0)
then True
else (isMultipleByList v xs)
nextNotMultipleLoop :: Integer -> Integer -> [Integer] -> Integer
nextNotMultipleLoop step v xs = if not (isMultipleByList v xs)
then v
else nextNotMultipleLoop step (v + step) xs
nextNotMultiple :: [Integer] -> Integer
nextNotMultiple xs = if xs == [2]
then nextNotMultipleLoop 1 (maximum xs) xs
else nextNotMultipleLoop 2 (maximum xs) xs
addNextNotMultiple xs = xs ++ [nextNotMultiple xs]
infinitePrimeList = [2] : map (addNextNotMultiple) infinitePrimeList
take 10 infinitePrimeList
[
[2,3],
[2,3,5],
[2,3,5,7],
[2,3,5,7,11],
[2,3,5,7,11,13],
[2,3,5,7,11,13,17],
[2,3,5,7,11,13,17,19],
[2,3,5,7,11,13,17,19,23],
[2,3,5,7,11,13,17,19,23,29],
[2,3,5,7,11,13,17,19,23,29,31]
]
infinitePrimeList !! 10
[2,3,5,7,11,13,17,19,23,29,31,37]
You can think so:
You want to create a list (call them a) which starts on [1,2]:
a = [1,2] ++ ???
... and have this property: each next element in a is a sum of all previous elements in a. So you can write
scanl1 (+) a
and get a new list, in which any element with index n is sum of n first elements of list a. So, it is [1, 3, 6 ...]. All you need is take all elements without first:
tail (scanl1 (+) a)
So, you can define a as:
a = [1,2] ++ tail (scanl1 (+) a)
This way of thought you can apply with other similar problems of definition list through its elements.
If we already had the final result, calculating the list of previous elements for a given element would be easy, a simple application of the inits function.
Let's assume we already have the final result xs, and use it to compute xs itself:
import Data.List (inits)
main :: IO ()
main = do
let is = drop 2 $ inits xs
xs = 1 : 2 : map sum is
print $ take 10 xs
This produces the list
[1,2,3,6,12,24,48,96,192,384]
(Note: this is less efficient than SergeyKuz1001's solution, because the sum is re-calculated each time.)
unfoldr has a quite nice flexibility to adapt to various "create-a-list-from-initial-conditions"-problems so I think it is worth mentioning.
A little less elegant for this specific case, but shows how unfoldr can be used.
import Data.List
nextVal as = Just (s,as++[s])
where s = sum as
initList = [1,2]
myList =initList ++ ( unfoldr nextVal initList)
main = putStrLn . show . (take 12) $ myList
Yielding
[1,2,3,6,12,24,48,96,192,384,768,1536]
in the end.
As pointed out in the comment, one should think a little when using unfoldr. The way I've written it above, the code mimicks the code in the original question. However, this means that the accumulator is updated with as++[s], thus constructing a new list at every iteration. A quick run at https://repl.it/languages/haskell suggests it becomes quite memory intensive and slow. (4.5 seconds to access the 2000nd element in myList
Simply swapping the acumulator update to a:as produced a 7-fold speed increase. Since the same list can be reused as accumulator in every step it goes faster. However, the accumulator list is now in reverse, so one needs to think a little bit. In the case of predicate function sum this makes no differece, but if the order of the list matters, one must think a little bit extra.
You could define it like this:
xs = 1:2:iterate (*2) 3
For example:
Prelude> take 12 xs
[1,2,3,6,12,24,48,96,192,384,768,1536]
So here's my take. I tried not to create O(n) extra lists.
explode ∷ Integral i ⇒ (i ->[a] -> a) -> [a] -> [a]
explode fn init = as where
as = init ++ [fn i as | i <- [l, l+1..]]
l = genericLength init
This convenience function does create additional lists (by take). Hopefully they can be optimised away by the compiler.
explode' f = explode (\x as -> f $ take x as)
Usage examples:
myList = explode' sum [1,2]
sum' 0 xs = 0
sum' n (x:xs) = x + sum' (n-1) xs
myList2 = explode sum' [1,2]
In my tests there's little performance difference between the two functions. explode' is often slightly better.
The solution from #LudvigH is very nice and clear. But, it was not faster.
I am still working on the benchmark to compare the other options.
For now, this is the best solution that I could find:
-------------------------------------------------------------------------------------
-- # infinite sum of the previous using fuse
-------------------------------------------------------------------------------------
recursiveSum xs = [nextValue] ++ (recursiveSum (nextList)) where
nextValue = sum(xs)
nextList = xs ++ [nextValue]
initialSumValues = [1]
infiniteSumFuse = initialSumValues ++ recursiveSum initialSumValues
-------------------------------------------------------------------------------------
-- # infinite prime list using fuse
-------------------------------------------------------------------------------------
-- calculate the current value based in the current list
-- call the same function with the new combined value
recursivePrimeList xs = [nextValue] ++ (recursivePrimeList (nextList)) where
nextValue = nextNonMultiple(xs)
nextList = xs ++ [nextValue]
initialPrimes = [2]
infiniteFusePrimeList = initialPrimes ++ recursivePrimeList initialPrimes
This approach is fast and makes good use of many cores.
Maybe there is some faster solution, but I decided to post this to share my current progress on this subject so far.
In general, define
xs = x1 : zipWith f xs (inits xs)
Then it's xs == x1 : f x1 [] : f x2 [x1] : f x3 [x1, x2] : ...., and so on.
Here's one example of using inits in the context of computing the infinite list of primes, which pairs them up as
ps = 2 : f p1 [p1] : f p2 [p1,p2] : f p3 [p1,p2,p3] : ...
(in the definition of primes5 there).

Project Euler 50: Algorithm is incredibly slow, failing to understand why

I'm using Project Euler to learn Haskell. I'm new at Haskell and am having a lot of trouble coming up with an algorithm that doesn't take an absurd amount of time. I'm estimating that the program here would take 14 gigayears to arrive at the solution.
The problem:
Which prime, below one-million, can be written as the sum of the most
consecutive primes?
Here's my source. I've left out isPrime. I've posted it because it's far too inefficient to solve the problem. I think the issue lies with the slicedChains and primeChains calls, but I'm not sure what it is. I've resolved this before with C++. But for whatever reason, the efficient solution seems beyond me in Haskell.
Edit: I've included isPrime.
import System.Environment (getArgs)
import Data.List (nub,maximumBy)
import Data.Ord (comparing)
isPrime :: Integer -> Bool
isPrime 1 = False
isPrime 2 = True
isPrime x
| any (== 0) (fmap (x `mod`) [2..x-1]) = False
| otherwise = True
primeChain :: Integer -> [Integer]
primeChain x = [ n | n <- 1 : 2 : [3,5..x-1], isPrime n ]
slice :: [a] -> [Int] -> [a]
slice xs args = take (to - from + 1) (drop from xs)
where from = head args
to = last args
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
slicedChains :: Int -> [Integer] -> [[Integer]]
slicedChains len xs = nub [x | x <- fmap (xs `slice`) subseqs, length x > 1]
where subseqs = [x | x <- (subsequencesOfSize 2 [1..len]), (last x) > (head x)]
primeSums :: Integer -> [[Integer]]
primeSums x = filter (\ns -> sum ns == x) chain
where xs = primeChain x
len = length xs
chain = slicedChains len xs
compLength :: [[a]] -> [a]
compLength xs = maximumBy (comparing length) xs
cleanSums :: [Integer] -> [[Integer]]
cleanSums xs = fmap (compLength) filtered
where filtered = filter (not . null) (fmap primeSums xs)
main :: IO()
main = do
args <- getArgs
let arg = read (head args) :: Integer
let xs = primeChain arg
print $ maximumBy (comparing length) $ cleanSums xs
Your basic problem is that you are not pruning your search space based on the best solution you have found so far.
I can tell this just from the fact that you are using maximumBy to find the longest sequence.
For instance, if during your search your find a consecutive sequence of 4 primes whose sum is a prime < 10^6, you don't have to examine any sequence which begins with a prime greater than 250000.
To do this kind of pruning you have to keep track of the solution found so far and interleave the testing of candidate sequences with their generation so that the best solution found so far can stop the search early.
Update
There are several inefficiencies in slicedChains. Haskell lists are implemented a linked lists. This video is pretty good overview of linked lists and how they differ from arrays: (link)
The following expressions in your code are going to be problematic w.r.t. efficiency:
* nub has quadratic running time
* length x > 1 - the complexity of length is O(n) where n is the length of the list. A better way to write this is:
lengthGreaterThan1 :: [a] -> Bool
lengthGreaterThan1 (_:_:_) = True
lengthGreaterThan1 _ = False
* subsequencesOfSize 2 [1..len] may be more succinctly written:
[ [a,b] | a <- [1..len], b <- [a+1..len] ]
and this will also ensure that a < b.
* The take and drop calls in slice are also O(n)
* In primeSums the call to primeChain will regenerate essentially the same list over and over again resulting in a lot of multiple calls to isPrime. A better approach is to define primeChain like this:
allPrimes = filter isPrime [1..]
primeChain x = takeWhile (<= x) allPrimes
The list allPrimes will be generated once, and primeChain simply takes prefixes of that list.
* primeSums x is charged with finding sequences whose sum is exactly x, but it looks at a lot of sequences that can't possibly work. For instance, primeSums 31 will examine:
11 + 13 + 17, 11 + 13 + 17 + 23, 11 + 13 + 17 + 23 + 29,
17 + 19, 17 + 19 + 23, 17 + 19 + 23 + 29,
19 + 23, 19 + 23 + 29
23 + 29
even though it's pretty obvious that none of these sums could equal 31.
So the first thing you need is a good data structure: Once you find a sequence of length n you don't care about sequences of shorter length, so your primary needs are: (1) tracking the sum, (2) tracking the primes in the set, (3) removing the least element, (4) adding a new greatest element. The key is amortization, where a big cost is paid infrequently enough that you can pretend it is a small cost per procedure. The data structure looks like this:
data Queue x = Q [x] [x]
q_empty (Q [] []) = True
q_empty _ = False
q_headtails (Q (x:xs) rest) = (x, Q xs rest)
q_headtails (Q [] xs) = case reverse xs of y:ys -> (y, Q ys [])
[] -> error "End of queue."
q_append el (Q beg end) = Q beg (el:end)
So deconstructing the list is possible, but sometimes triggers an O(n) operation, but that's OK because when it does, we won't have to do it for another n steps, so it averages out to one operation per step. (You might also want to do it with a spine-strict list.)
To save on length operations and summing the items of the list you probably want to cache those, too:
type Length = Int
type Sum = Int
type Prime = Int
data PrimeSeq = PS Length Sum (Queue Prime)
headTails (PS len sum q) = (x, PS (len - 1) (sum - x) xs)
where (x, xs) = q_headtails q
append x (PS len sum xs) = PS (len + 1) (sum + x) (q_append x xs)
The algorithm for these looks like:
Cache a copy of the PrimeSeq you're starting with
Keep adding primes to it and testing primality until you get to 10^6.
If you find a new prime with a longer sequence, replace the cache.
Whenever you run into 10^6, revert to the cache, pull a prime off the front of the queue, then repeat as needed.
Your prime generation is quadratic (isPrime 101 tests rem 101 100 == 0 even though 10 is the biggest number by which 101 needs to be tested -- and actually 7 is enough).
Yet even with it, a simple enough list-based code finds the answer in under 2 seconds (on an Intel Core i7 2.5 GHz, interpreted in GHCi). And with the code corrected to take advantage of the above mentioned optimization (and additionally, testing by primes only), it takes 0.1s.
Also, f x | t = False | otherwise = True is the same as f x = not t.
We are asked by the PE site not to give you even a hint.
But in general, the key to efficiency in Haskell, thanks to its laziness, is being generative with as small a duplication of effort as possible. As one example, instead of calculating each slice of a list in isolation starting anew, we can produce the bunch of them together as part of one process,
slices :: Int -> [a] -> [[a]]
slices n = map (take n) . iterate tail -- sequence of list's slices of length n each
Another principle is, try to solve a more general problem, of which yours is an instance.
Having written such a function, we can play with it by trying out different values for its parameters, from smaller to the bigger ones, for an exploratory style of problem solving. We're told about 21 consecutive primes. What about 22 of them? 27? 1127 of them? ... and I've said enough about this already.
If it starts taking too much time, we can assess the full solution's needed run time by empirical orders of growth analysis.
Though the solution is found quickly enough with your unoptimized isPrime code, the exploratory process can be prohibitively slow with it, but it is fast enough with the optimized code:
primes :: [Int]
primes = 2 : filter isPrime [3,5..]
isPrime n = and [rem n p > 0 | p <- takeWhile ((<= n).(^2)) primes]

How to split a [String] in to [[String]] based on length

I'm trying to split a list of Strings in to a List of Lists of Strings
so like in the title [String] -> [[String]]
This has to be done based on length of characters, so that the Lists in the output are no longer than 10. So if input was length 20 this would be broken down in to 2 lists and if length 21 in to 3 lists.
I'm not sure what to use to do this, I don't even know how to brake down a list in to a list of lists never mind based on certain length.
For example if the limit was 5 and the input was:
["abc","cd","abcd","ab"]
The output would be:
[["abc","cd"],["abcd"],["ab"]]
I'd like to be pointed in the right direction and what methods to use, list comprehension? recursion?
Here's an intuitive solution:
import Data.List (foldl')
breakup :: Int -> [[a]] -> [[[a]]]
breakup size = foldl' accumulate [[]]
where accumulate broken l
| length l > size = error "Breakup size too small."
| sum (map length (last broken ++ [l])) <= size
= init broken ++ [last broken ++ [l]]
| otherwise = broken ++ [[l]]
Now, let's go through it line-by-line:
breakup :: Int -> [[a]] -> [[[a]]]
Since you hinted that you may want to generalize the function to accept different size limits, our type signature reflects this. We also generalize beyond [String] (that is, [[Char]]), since our problem is not specific to [[Char]], and could equally apply to any [[a]].
breakup size = foldl' accumulate [[]]
We're using a left fold because we want to transform a list, left-to-right, into our target, which will be a list of sub-lists. Even though we're not concerned with efficiency, we're using Data.List.foldl' instead of Prelude's own foldl because this is standard practice. You can read more about foldl vs. foldl' here.
Our folding function is called accumulate. It will consider a new item and decide whether to place it in the last-created sub-list or to start a new sub-list. To make that judgment, it uses the size we passed in. We start with an initial value of [[]], that is, a list with one empty sub-list.
Now the question is, how should you accumulate your target?
where accumulate broken l
We're using broken to refer to our constructed target so far, and l (for "list") to refer to the next item to process. We'll use guards for the different cases:
| length l > size = error "Breakup size too small."
We need to raise an error if the item surpasses the size limit on its own, since there's no way to place it in a sub-list that satisfies the size limit. (Alternatively, we could build a safe function by wrapping our return value in the Maybe monad, and that's something you should definitely try out on your own.)
| sum (map length (last broken ++ [l])) <= size
= init broken ++ [last broken ++ [l]]
The guard condition is sum (map length (last broken ++ [l])) <= size, and the return value for this guard is init broken ++ [last broken ++ [l]]. Translated into plain English, we might say, "If the item can fit in the last sub-list without going over the size limit, append it there."
| otherwise = broken ++ [[l]]
On the other hand, if there isn't enough "room" in the last sub-list for this item, we start a new sub-list, containing only this item. When the accumulate helper is applied to the next item in the input list, it will decide whether to place that item in this sub-list or start yet another sub-list, following the same logic.
There you have it. Don't forget to import Data.List (foldl') up at the top. As another answer points out, this is not a performant solution if you plan to process 100,000 strings. However, I believe this solution is easier to read and understand. In many cases, readability is the more important optimization.
Thanks for the fun question. Good luck with Haskell, and happy coding!
You can do something like this:
splitByLen :: Int -> [String] -> [[String]]
splitByLen n s = go (zip s $ scanl1 (+) $ map length s) 0
where go [] _ = []
go xs prev = let (lst, rest) = span (\ (x, c) -> c - prev <= n) xs
in (map fst lst) : go rest (snd $ last lst)
And then:
*Main> splitByLen 5 ["abc","cd","abcd","ab"]
[["abc","cd"],["abcd"],["ab"]]
In case there is a string longer than n, this function will fail. Now, what you want to do in those cases depends on your requirements and that was not specified in your question.
[Update]
As requested by #amar47shah, I made a benchmark comparing his solution (breakup) with mine (splitByLen):
import Data.List
import Data.Time.Clock
import Control.DeepSeq
import System.Random
main :: IO ()
main = do
s <- mapM (\ _ -> randomString 10) [1..10000]
test "breakup 10000" $ breakup 10 s
test "splitByLen 10000" $ splitByLen 10 s
putStrLn ""
r <- mapM (\ _ -> randomString 10) [1..100000]
test "breakup 100000" $ breakup 10 r
test "splitByLen 100000" $ splitByLen 10 r
test :: (NFData a) => String -> a -> IO ()
test s a = do time1 <- getCurrentTime
time2 <- a `deepseq` getCurrentTime
putStrLn $ s ++ ": " ++ show (diffUTCTime time2 time1)
randomString :: Int -> IO String
randomString n = do
l <- randomRIO (1,n)
mapM (\ _ -> randomRIO ('a', 'z')) [1..l]
Here are the results:
breakup 10000: 0.904012s
splitByLen 10000: 0.005966s
breakup 100000: 150.945322s
splitByLen 100000: 0.058658s
Here is another approach. It is clear from the problem that the result is a list of lists and we need a running length and an inner list to keep track of how much we have accumulated (We use foldl' with these two as input). We then describe what we want which is basically:
If the length of the current input string itself exceeds the input length, we ignore that string (you may change this if you want a different behavior).
If the new length after we have added the length of the current string is within our input length, we add it to the current result list.
If the new length exceeds the input length, we add the result so far to the output and start a new result list.
chunks len = reverse . map reverse . snd . foldl' f (0, [[]]) where
f (resSoFar#(lenSoFar, (currRes: acc)) curr
| currLength > len = resSoFar -- ignore
| newLen <= len = (newLen, (curr: currRes):acc)
| otherwise = (currLength, [curr]:currRes:acc)
where
newLen = lenSoFar + currLength
currLength = length curr
Every time we add a result to the output list, we add it to the front hence we need reverse . map reverse at the end.
> chunks 5 ["abc","cd","abcd","ab"]
[["abc","cd"],["abcd"],["ab"]]
> chunks 5 ["abc","cd","abcdef","ab"]
[["abc","cd"],["ab"]]
Here is an elementary approach. First, the type String doesn't matter, so we can define our function in terms of a general type a:
breakup :: [a] -> [[a]]
I'll illustrate with a limit of 3 instead of 10. It'll be obvious how to implement it with another limit.
The first pattern will handle lists which are of size >= 3 and the the second pattern handles all of the other cases:
breakup (a1 : a2 : a3 : as) = [a1, a2, a3] : breakup as
breakup as = [ as ]
It is important to have the patterns in this order. That way the second pattern will only be used when the first pattern does not match, i.e. when there are less than 3 elements in the list.
Examples of running this on some inputs:
breakup [1..5] -> [ [1,2,3], [4,5] ]
breakup [1..4] -> [ [1,2,3], [4] ]
breakup [1..2] -> [ [1,2] ]
breakup [1..3] -> [ [1,2,3], [] ]
We see these is an extra [] when we run the function on [1..3]. Fortunately this is easy to fix by inserting another rule before the last one:
breakup [] = []
The complete definition is:
breakup :: [a] -> [[a]]
breakup [] = []
breakup (a1 : a2 : a3 : as) = [a1, a2, a3] : breakup as
breakup as = [ as ]

Dovetail iteration over infinite lists in Haskell

I want to iterate 2 (or 3) infinite lists and find the "smallest" pair that satisfies a condition, like so:
until pred [(a,b,c) | a<-as, b<-bs, c<-cs]
where pred (a,b,c) = a*a + b*b == c*c
as = [1..]
bs = [1..]
cs = [1..]
The above wouldn't get very far, as a == b == 1 throughout the run of the program.
Is there a nice way to dovetail the problem, e.g. build the infinite sequence [(1,1,1),(1,2,1),(2,1,1),(2,1,2),(2,2,1),(2,2,2),(2,2,3),(2,3,2),..] ?
Bonus: is it possible to generalize to n-tuples?
There's a monad for that, Omega.
Prelude> let as = each [1..]
Prelude> let x = liftA3 (,,) as as as
Prelude> let x' = mfilter (\(a,b,c) -> a*a + b*b == c*c) x
Prelude> take 10 $ runOmega x'
[(3,4,5),(4,3,5),(6,8,10),(8,6,10),(5,12,13),(12,5,13),(9,12,15),(12,9,15),(8,15,17),(15,8,17)]
Using it's applicative features, you can generalize to arbitrary tuples:
quadrupels = (,,,) <$> as <*> as <*> as <*> as -- or call it liftA4
But: this alone does not eliminate duplication, of course. It only gives you proper diagonalization. Maybe you could use monad comprehensions together with an approach like Thomas's, or just another mfilter pass (restricting to b /= c, in this case).
List comprehensions are great (and concise) ways to solve such problems. First, you know you want all combinations of (a,b,c) that might satisfy a^2 + b^2 = c^2 - a helpful observation is that (considering only positive numbers) it will always be the case that a <= c && b <= c.
To generate our list of candidates we can thus say c ranges from 1 to infinity while a and b range from one to c.
[(a,b,c) | c <- [1..], a <- [1..c], b <- [1..c]]
To get to the solution we just need to add your desired equation as a guard:
[(a,b,c) | c <- [1..], a <- [1..c], b <- [1..c], a*a+b*b == c*c]
This is inefficient, but the output is correct:
[(3,4,5),(4,3,5),(6,8,10),(8,6,10),(5,12,13),(12,5,13),(9,12,15)...
There are more principled methods than blind testing that can solve this problem.
{- It depends on what is "smallest". But here is a solution for a concept of "smallest" if tuples were compared first by their max. number and then by their total sum. (You can just copy and paste my whole answer into a file as I write the text in comments.)
We will need nub later. -}
import Data.List (nub)
{- Just for illustration: the easy case with 2-tuples. -}
-- all the two-tuples where 'snd' is 'n'
tuples n = [(i, n) | i <- [1..n]]
-- all the two-tuples where 'snd' is in '1..n'
tuplesUpTo n = concat [tuples i | i <- [1..n]]
{-
To get all results, you will need to insert the flip of each tuple into the stream. But let's do that later and generalize first.
Building tuples of arbitrary length is somewhat difficult, so we will work on lists. I call them 'kList's, if they have a length 'k'.
-}
-- just copied from the tuples case, only we need a base case for k=1 and
-- we can combine all results utilizing the list monad.
kLists 1 n = [[n]]
kLists k n = do
rest <- kLists (k-1) n
add <- [1..head rest]
return (add:rest)
-- same as above. all the klists with length k and max number of n
kListsUpTo k n = concat [kLists k i | i <- [1..n]]
-- we can do that unbounded as well, creating an infinite list.
kListsInf k = concat [kLists k i | i <- [1..]]
{-
The next step is rotating these lists around, because until now the largest number is always in the last place. So we just look at all rotations to get all the results. Using nub here is admittedly awkward, you can improve that. But without it, lists where all elements are the same are repeated k times.
-}
rotate n l = let (init, end) = splitAt n l
in end ++ init
rotations k l = nub [rotate i l | i <- [0..k-1]]
rotatedKListsInf k = concatMap (rotations k) $ kListsInf k
{- What remains is to convert these lists into tuples. This is a bit awkward, because every n-tuple is a separate type. But it's straightforward, of course. -}
kListToTuple2 [x,y] = (x,y)
kListToTuple3 [x,y,z] = (x,y,z)
kListToTuple4 [x,y,z,t] = (x,y,z,t)
kListToTuple5 [x,y,z,t,u] = (x,y,z,t,u)
kListToTuple6 [x,y,z,t,u,v] = (x,y,z,t,u,v)
{- Some tests:
*Main> take 30 . map kListToTuple2 $ rotatedKListsInf 2
[(1,1),(1,2),(2,1),(2,2),(1,3),(3,1),(2,3),(3,2),(3,3),(1,4),(4,1),(2,4),(4,2),(3,4),
(4,3),(4,4),(1,5),(5,1),(2,5),(5,2),(3,5),(5,3),(4,5),(5,4),(5,5),(1,6),(6,1),
(2,6), (6,2), (3,6)]
*Main> take 30 . map kListToTuple3 $ rotatedKListsInf 3
[(1,1,1),(1,1,2),(1,2,1),(2,1,1),(1,2,2),(2,2,1),(2,1,2),(2,2,2),(1,1,3),(1,3,1),
(3,1,1),(1,2,3),(2,3,1),(3,1,2),(2,2,3),(2,3,2),(3,2,2),(1,3,3),(3,3,1),(3,1,3),
(2,3,3),(3,3,2),(3,2,3),(3,3,3),(1,1,4),(1,4,1),(4,1,1),(1,2,4),(2,4,1),(4,1,2)]
Edit:
I realized there is a bug: Just rotating the ordered lists isn't enough of course. The solution must be somewhere along the lines of having
rest <- concat . map (rotations (k-1)) $ kLists (k-1) n
in kLists, but then some issues with repeated outputs arise. You can figure that out, I guess. ;-)
-}
It really depends on what you mean by "smallest", but I assume you want to find a tuple of numbers with respect to its maximal element - so (2,2) is less than (1,3) (while standard Haskell ordering is lexicographic).
There is package data-ordlist, which is aimed precisely at working with ordered lists. It's function mergeAll (and mergeAllBy) allows you to combine a 2-dimensional matrix ordered in each direction into an ordered list.
First let's create a desired comparing function on tuples:
import Data.List (find)
import Data.List.Ordered
compare2 :: (Ord a) => (a, a) -> (a, a) -> Ordering
compare2 x y = compare (max2 x, x) (max2 y, y)
where
max2 :: Ord a => (a, a) -> a
max2 (x, y) = max x y
Then using mergeAll we create a function that takes a comparator, a combining function (which must be monotonic in both arguments) and two sorted lists. It combines all possible elements from the two lists using the function and produces a result sorted list:
mergeWith :: (b -> b -> Ordering) -> (a -> a -> b) -> [a] -> [a] -> [b]
mergeWith cmp f xs ys = mergeAllBy cmp $ map (\x -> map (f x) xs) ys
With this function, it's very simple to produce tuples ordered according to their maximum:
incPairs :: [(Int,Int)]
incPairs = mergeWith compare2 (,) [1..] [1..]
Its first 10 elements are:
> take 10 incPairs
[(1,1),(1,2),(2,1),(2,2),(1,3),(2,3),(3,1),(3,2),(3,3),(1,4)]
and when we (for example) look for the first pair whose sum of squares is equal to 65:
find (\(x,y) -> x^2+y^2 == 65) incPairs
we get the correct result (4,7) (as opposed to (1,8) if lexicographic ordering were used).
This answer is for a more general problem for a unknown predicate. If the predicate is known, more efficient solutions are possible, like others have listed solutions based on knowledge that you don't need to iterate for all Ints for a given c.
When dealing with infinite lists, you need to perform breadth-first search for solution. The list comprehension only affords depth-first search, that is why you never arrive at a solution in your original code.
counters 0 xs = [[]]
counters n xs = concat $ foldr f [] gens where
gens = [[x:t | t <- counters (n-1) xs] | x <- xs]
f ys n = cat ys ([]:n)
cat (y:ys) (x:xs) = (y:x): cat ys xs
cat [] xs = xs
cat xs [] = [xs]
main = print $ take 10 $ filter p $ counters 3 [1..] where
p [a,b,c] = a*a + b*b == c*c
counters generates all possible counters for values from the specified range of digits, including a infinite range.
First, we obtain a list of generators of valid combinations of counters - for each permitted digit, combine it with all permitted combinations for counters of smaller size. This may result in a generator that produces a infinite number of combinations. So, we need to borrow from each generator evenly.
So gens is a list of generators. Think of this as a list of all counters starting with one digit: gens !! 0 is a list of all counters starting with 1, gens !! 1 is a list of all counters starting with 2, etc.
In order to borrow from each generator evenly, we could transpose the list of generators - that way we would get a list of first elements of the generators, followed by a list of second elements of the generators, etc.
Since the list of generators may be infinite, we cannot afford to transpose the list of generators, because we may never get to look at the second element of any generator (for a infinite number of digits we'd have a infinite number of generators). So, we enumerate the elements from the generators "diagonally" - take first element from the first generator; then take the second element from the first generator and the first from the second generator; then take the third element from the first generator, the second from the second, and the first element from the third generator, etc. This can be done by folding the list of generators with a function f, which zips together two lists - one list is the generator, the other is the already-zipped generators -, the beginning of one of them being offset by one step by adding []: to the head. This is almost zipWith (:) ys ([]:n) - the difference is that if n or ys is shorter than the other one, we don't drop the remainder of the other list. Note that folding with zipWith (:) ys n would be a transpose.
For this answer I will take "smallest" to refer to the sum of the numbers in the tuple.
To list all possible pairs in order, you can first list all of the pairs with a sum of 2, then all pairs with a sum of 3 and so on. In code
pairsWithSum n = [(i, n-i) | i <- [1..n-1]]
xs = concatMap pairsWithSum [2..]
Haskell doesn't have facilities for dealing with n-tuples without using Template Haskell, so to generalize this you will have to switch to lists.
ntuplesWithSum 1 s = [[s]]
ntuplesWithSum n s = concatMap (\i -> map (i:) (ntuplesWithSum (n-1) (s-i))) [1..s-n+1]
nums n = concatMap (ntuplesWithSum n) [n..]
Here's another solution, with probably another slightly different idea of "smallest". My order is just "all tuples with max element N come before all tuples with max element N+1". I wrote the versions for pairs and triples:
gen2_step :: Int -> [(Int, Int)]
gen2_step s = [(x, y) | x <- [1..s], y <- [1..s], (x == s || y == s)]
gen2 :: Int -> [(Int, Int)]
gen2 n = concatMap gen2_step [1..n]
gen2inf :: [(Int, Int)]
gen2inf = concatMap gen2_step [1..]
gen3_step :: Int -> [(Int, Int, Int)]
gen3_step s = [(x, y, z) | x <- [1..s], y <- [1..s], z <- [1..s], (x == s || y == s || z == s)]
gen3 :: Int -> [(Int, Int, Int)]
gen3 n = concatMap gen3_step [1..n]
gen3inf :: [(Int, Int, Int)]
gen3inf = concatMap gen3_step [1..]
You can't really generalize it to N-tuples, though as long as you stay homogeneous, you may be able to generalize it if you use arrays. But I don't want to tie my brain into that knot.
I think this is the simplest solution if "smallest" is defined as x+y+z because after you find your first solution in the space of Integral valued pythagorean triangles, your next solutions from the infinite list are bigger.
take 1 [(x,y,z) | y <- [1..], x <- [1..y], z <- [1..x], z*z + x*x == y*y]
-> [(4,5,3)]
It has the nice property that it returns each symmetrically unique solution only once. x and z are also infinite, because y is infinite.
This does not work, because the sequence for x never finishes, and thus you never get a value for y, not to mention z. The rightmost generator is the innermost loop.
take 1 [(z,y,x)|z <- [1..],y <- [1..],x <- [1..],x*x + y*y == z*z]
Sry, it's quite a while since I did haskell, so I'm going to describe it with words.
As I pointed out in my comment. It is not possible to find the smallest anything in an infinite list, since there could always be a smaller one.
What you can do is, have a stream based approach that takes the lists and returns a list with only 'valid' elements, i. e. where the condition is met. Lets call this function triangle
You can then compute the triangle list to some extent with take n (triangle ...) and from this n elements you can find the minium.

Resources