Project Euler 50: Algorithm is incredibly slow, failing to understand why - haskell

I'm using Project Euler to learn Haskell. I'm new at Haskell and am having a lot of trouble coming up with an algorithm that doesn't take an absurd amount of time. I'm estimating that the program here would take 14 gigayears to arrive at the solution.
The problem:
Which prime, below one-million, can be written as the sum of the most
consecutive primes?
Here's my source. I've left out isPrime. I've posted it because it's far too inefficient to solve the problem. I think the issue lies with the slicedChains and primeChains calls, but I'm not sure what it is. I've resolved this before with C++. But for whatever reason, the efficient solution seems beyond me in Haskell.
Edit: I've included isPrime.
import System.Environment (getArgs)
import Data.List (nub,maximumBy)
import Data.Ord (comparing)
isPrime :: Integer -> Bool
isPrime 1 = False
isPrime 2 = True
isPrime x
| any (== 0) (fmap (x `mod`) [2..x-1]) = False
| otherwise = True
primeChain :: Integer -> [Integer]
primeChain x = [ n | n <- 1 : 2 : [3,5..x-1], isPrime n ]
slice :: [a] -> [Int] -> [a]
slice xs args = take (to - from + 1) (drop from xs)
where from = head args
to = last args
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
slicedChains :: Int -> [Integer] -> [[Integer]]
slicedChains len xs = nub [x | x <- fmap (xs `slice`) subseqs, length x > 1]
where subseqs = [x | x <- (subsequencesOfSize 2 [1..len]), (last x) > (head x)]
primeSums :: Integer -> [[Integer]]
primeSums x = filter (\ns -> sum ns == x) chain
where xs = primeChain x
len = length xs
chain = slicedChains len xs
compLength :: [[a]] -> [a]
compLength xs = maximumBy (comparing length) xs
cleanSums :: [Integer] -> [[Integer]]
cleanSums xs = fmap (compLength) filtered
where filtered = filter (not . null) (fmap primeSums xs)
main :: IO()
main = do
args <- getArgs
let arg = read (head args) :: Integer
let xs = primeChain arg
print $ maximumBy (comparing length) $ cleanSums xs

Your basic problem is that you are not pruning your search space based on the best solution you have found so far.
I can tell this just from the fact that you are using maximumBy to find the longest sequence.
For instance, if during your search your find a consecutive sequence of 4 primes whose sum is a prime < 10^6, you don't have to examine any sequence which begins with a prime greater than 250000.
To do this kind of pruning you have to keep track of the solution found so far and interleave the testing of candidate sequences with their generation so that the best solution found so far can stop the search early.
Update
There are several inefficiencies in slicedChains. Haskell lists are implemented a linked lists. This video is pretty good overview of linked lists and how they differ from arrays: (link)
The following expressions in your code are going to be problematic w.r.t. efficiency:
* nub has quadratic running time
* length x > 1 - the complexity of length is O(n) where n is the length of the list. A better way to write this is:
lengthGreaterThan1 :: [a] -> Bool
lengthGreaterThan1 (_:_:_) = True
lengthGreaterThan1 _ = False
* subsequencesOfSize 2 [1..len] may be more succinctly written:
[ [a,b] | a <- [1..len], b <- [a+1..len] ]
and this will also ensure that a < b.
* The take and drop calls in slice are also O(n)
* In primeSums the call to primeChain will regenerate essentially the same list over and over again resulting in a lot of multiple calls to isPrime. A better approach is to define primeChain like this:
allPrimes = filter isPrime [1..]
primeChain x = takeWhile (<= x) allPrimes
The list allPrimes will be generated once, and primeChain simply takes prefixes of that list.
* primeSums x is charged with finding sequences whose sum is exactly x, but it looks at a lot of sequences that can't possibly work. For instance, primeSums 31 will examine:
11 + 13 + 17, 11 + 13 + 17 + 23, 11 + 13 + 17 + 23 + 29,
17 + 19, 17 + 19 + 23, 17 + 19 + 23 + 29,
19 + 23, 19 + 23 + 29
23 + 29
even though it's pretty obvious that none of these sums could equal 31.

So the first thing you need is a good data structure: Once you find a sequence of length n you don't care about sequences of shorter length, so your primary needs are: (1) tracking the sum, (2) tracking the primes in the set, (3) removing the least element, (4) adding a new greatest element. The key is amortization, where a big cost is paid infrequently enough that you can pretend it is a small cost per procedure. The data structure looks like this:
data Queue x = Q [x] [x]
q_empty (Q [] []) = True
q_empty _ = False
q_headtails (Q (x:xs) rest) = (x, Q xs rest)
q_headtails (Q [] xs) = case reverse xs of y:ys -> (y, Q ys [])
[] -> error "End of queue."
q_append el (Q beg end) = Q beg (el:end)
So deconstructing the list is possible, but sometimes triggers an O(n) operation, but that's OK because when it does, we won't have to do it for another n steps, so it averages out to one operation per step. (You might also want to do it with a spine-strict list.)
To save on length operations and summing the items of the list you probably want to cache those, too:
type Length = Int
type Sum = Int
type Prime = Int
data PrimeSeq = PS Length Sum (Queue Prime)
headTails (PS len sum q) = (x, PS (len - 1) (sum - x) xs)
where (x, xs) = q_headtails q
append x (PS len sum xs) = PS (len + 1) (sum + x) (q_append x xs)
The algorithm for these looks like:
Cache a copy of the PrimeSeq you're starting with
Keep adding primes to it and testing primality until you get to 10^6.
If you find a new prime with a longer sequence, replace the cache.
Whenever you run into 10^6, revert to the cache, pull a prime off the front of the queue, then repeat as needed.

Your prime generation is quadratic (isPrime 101 tests rem 101 100 == 0 even though 10 is the biggest number by which 101 needs to be tested -- and actually 7 is enough).
Yet even with it, a simple enough list-based code finds the answer in under 2 seconds (on an Intel Core i7 2.5 GHz, interpreted in GHCi). And with the code corrected to take advantage of the above mentioned optimization (and additionally, testing by primes only), it takes 0.1s.
Also, f x | t = False | otherwise = True is the same as f x = not t.
We are asked by the PE site not to give you even a hint.
But in general, the key to efficiency in Haskell, thanks to its laziness, is being generative with as small a duplication of effort as possible. As one example, instead of calculating each slice of a list in isolation starting anew, we can produce the bunch of them together as part of one process,
slices :: Int -> [a] -> [[a]]
slices n = map (take n) . iterate tail -- sequence of list's slices of length n each
Another principle is, try to solve a more general problem, of which yours is an instance.
Having written such a function, we can play with it by trying out different values for its parameters, from smaller to the bigger ones, for an exploratory style of problem solving. We're told about 21 consecutive primes. What about 22 of them? 27? 1127 of them? ... and I've said enough about this already.
If it starts taking too much time, we can assess the full solution's needed run time by empirical orders of growth analysis.
Though the solution is found quickly enough with your unoptimized isPrime code, the exploratory process can be prohibitively slow with it, but it is fast enough with the optimized code:
primes :: [Int]
primes = 2 : filter isPrime [3,5..]
isPrime n = and [rem n p > 0 | p <- takeWhile ((<= n).(^2)) primes]

Related

Are recursive calls in my "permutations with repetition" code accumulated to clog the RAM?

A bit of background:
I am an amateur programmer, having picked up Haskell a few months ago, on my spare time, after a period of Mathematica programmning (my first language). I am currently going through my second Haskell book, by Will Kurt, but I still have miles to go to call myself comfortable around Haskell code. Codeabbey has been my platform for experimentation and learning so far.
I have written a piece of code to generate permutations of a given number, that deals with possible duplicate numbers, so for 588 it will internally generate 588, 858 and 885.
However, because I want to scale to pretty big input numbers (think perhaps even a hundred digits long), I don't want to output the whole list and then perform calculations on it, instead every number that is generated is checked on the spot for a certain property and if it has it, well, we have a winner, the number is returned as output and there's no need to go through the rest of the humongous list. If sadly no desired number is found and we unsuccessfully go through all possible permutations, it outputs a "0".
I have also opted to make it a command line program to feed values to it via gnu parallel for faster work.
So here is the code
import System.Environment
import Data.List
toDigits :: Integer -> [Integer]
toDigits n = map (\n -> read [n]) (show n)
fromDigits :: Integral a => [a] -> Integer
fromDigits list = fromDigitsHelperFunction list 0
fromDigitsHelperFunction :: Integral a => [a] -> Integer -> Integer
fromDigitsHelperFunction [] acc = acc
fromDigitsHelperFunction (x:[]) acc = (fromIntegral x) + acc
fromDigitsHelperFunction digits#(x:xs) acc = fromDigitsHelperFunction xs (acc + ((fromIntegral x) * 10 ^((length digits) - 1 )))
testPermutationsWithRepetition :: ([Integer],Int,[Int],[(Int,Integer)]) -> [Integer]
testPermutationsWithRepetition (digits, index, rotationMap, registeredPositions)
| index == 0 && rotationMap !! index == 0 = [0,0,0] --finish state (no more recursion). Nothing more to do
| index == digitsLength - 1 && beautyCheck (fromDigits digits) = digits
| index == digitsLength - 1 = testPermutationsWithRepetition (digits, index-1, rotationMap, registeredPositions)
| not ((index,digits!!index) `elem` registeredPositions) = testPermutationsWithRepetition (digits, index+1, rotationMap, (index,digits!!index):registeredPositions)
| rotationMap!!index == 0 = testPermutationsWithRepetition (digits, index-1, restoredRotMap, restoredRegPositions)
| rotationMap!!index > 0 && (index,digits!!index) `elem` registeredPositions = testPermutationsWithRepetition (shiftLDigits, index, subtractRot, registeredPositions)
where digitsLength = length digits
shiftLDigits = (fst splitDigits) ++ (tail $ snd splitDigits) ++ [head $ snd splitDigits]
splitDigits = splitAt index digits
restoredRotMap = (fst splitRotMap) ++ [digitsLength - index] ++ (tail $ snd splitRotMap)
splitRotMap = splitAt index rotationMap
restoredRegPositions = filter (\pos -> fst pos < index) registeredPositions --clear everything below the parent index
subtractRot = (fst splitRotMap) ++ [(head $ snd splitRotMap) - 1] ++ (tail $ snd splitRotMap)
--Frontend function for testing permutations by inputting a single parameter (the number in digit form)
testPermsWithRep :: [Integer] -> [Integer]
testPermsWithRep digits = testPermutationsWithRepetition (digits, 0, [length $ digits, (length $ digits) -1 .. 1], [])
main :: IO ()
main = do
args <- getArgs
let number = read (head args) :: Integer
let checkResult = fromDigits $ testPermsWithRep $ toDigits number
print checkResult
It's really a sequential process with an index variable that points to a certain number on the digit list and performs a recursive call on that list based on my rules. The functions tracks its progress through the digit list for visited numbers in certain positions so far (to avoid repetition following already visited paths until it gets to the last digit (index == length -1). If the number that we get there passes the beauty check, it exits with the number produced.
Now, in a Mathematica (or I guess any imperative language) I would probably implement this with a While loop and Cases for its checks, and by the logic of the program, however long it took to compute (generate the permutations and check them for validity) it would take a moderate amount of memory, just enough to hold the list of "registeredPositions" really (you could call it the record of visited digits in specific positions, so it's a variable list as we go deeper in index but gets cleaned up as we move back up). However in this case, the recursive calls stack up as it seems and the whole thing acts as a fork bomb for sufficiently large numbers (e.g 27777772222222222222222223333) and eventually crashes. Is this behaviour something that can be handled differently in Haskell or is there no way to avoid the recursion and memory hogging?
I really like Haskell because the programs make logical sense, but I would like to use it also for cases like this where performance (and resources) matters.
As a side note, my brother pointed to this Algorithm to print all permutations with repetition of numbers in C that is reasonably fast (only generates a list though) and most importantly has minimal memory footprint, although I can tell there's also recursion used in it. Other that that I'm clueless when it comes to C and I would like to stick to Haskell, if it can do what I want at the end of the day, that is.
Any help is welcome. Have a good day!
Edit:
Per Soleil's suggestion I update my post with additional info provided in the comments. Specifically:
After compiling with "ghc checking_program.hs" I run the program with "./checking program 27777772222222222222222223333". On an i5 3470 with 4GB RAM it runs for about 10 minutes and exits with a segmentation fault. On my brothers 32GB machine he let it run until it took up 20GB of RAM. No need to go further I guess. My tests were on Ubuntu via Win10 WSL. His is bare Linux
testPermsWithRep is just a front end for testPermutationsWithRepetition, so that I can only provide the number and testPermsWithRep creates the initial parameters and calls testPermutationsWithRepetition with those. It outputs exactly what testPermutationsWithRepetition outputs, either a number (in digit form) that passes the test, or [0,0,0]. Now the test, the beautyCheck function is simply a test for single digit divisors of that number, that returns True or False. I didn't include it because it really is inconsequential. It could even be just a "bigger than x number" test.
An an example, calling "testPermsWithRep [2,6,7,3]" will call "testPermutationsWithRepetition ([2,6,7,3], 0, [4,3,2,1],[])" and whatever comes out of that function, testPermsWithRep will return that as well.
The performance issue with your program doesn't have anything to do with recursion. Rather, you seem to be running up against an accumulation of a partially evaluated, lazy data structure in your rotation map. Your program will run in constant memory if you use the deepseq package to fully force evaluation of the restoredRotMap:
-- Install the `deepseq` package and add this import
import Control.DeepSeq
-- And then change this one case
... | rotationMap!!index == 0 = restoredRotMap `deepseq`
testPermutationsWithRepetition (digits, index-1, restoredRotMap, restoredRegPositions)
Compiled with ghc -O2 and using beautyMap _ = False, this runs with a fixed resident memory usage of about 6 megs.
Some other performance targets:
You might want to replace most of your Integer types with Int, as this will be faster. I think you only need Integer for the input to toDigits and the output of fromDigits, and everything else can be Int, since it's all indexes and digits.
An even bigger win will be to replace your rotation map and registered positions with better data structures. If you find yourself splicing up lists with lots of listpart1 ++ [x] ++ listpart2 calls, there are going to be enormous performance costs to that, and the linear lookups with (!!) aren't helping either.
So I am not 100% sure of this and I am also not 100% sure I understand your code.
But as far as I understand you are generating permutations without duplicates and then you are checking for some predicate wanting whatever single number that fulfils it.
I think it should help to use as many of the prelude functions as possible because afaik then the compiler understands it can optimize recursion into a loop. As a rule of thumb I was taught to avoid explicit recursion as much as possible and instead use prelude functions like map, filter and fold. Mainly you avoid reinventing the wheel this way but there also should be a higher chance of the compiler optimizing things.
So to solve your problem try generating a list of all permutations, then filter it using filter and then just do take 1 if you want the result that is found first. Because of Haskell's lazy evaluation take 1 makes it so that we are interested only in the first x in (x:xs) that a filter would return. Therefore filter will keep dropping elements from the, again lazily evaluated, list of permutations and when it finds one it stops.
I found a permutation implementation on https://rosettacode.org/wiki/Permutations#Haskell
and used it to try this call:
take 1 $ filter ((> 67890123456789012345) . fromDigits) $ permutations' $ toDigits 12345678901234567890
it has been running for like 20 minutes now and RAM usage has stayed around 230 MB.
I hope that has answered/helped you at least in some way.
+ a bonus tip: you can simplify your fromDigits to this beautiful thing:
fromDigits :: Integral a => [a] -> Integer
fromDigits = foldl shiftAndAdd 0
where shiftAndAdd acc d = 10 * acc + fromIntegral d
EDIT:
I read some more of the comments and I see you care about ignoring duplicates but I am afraid you'll have to go smarter about that, since if I understand correctly your implementation still generates all the duplicates it only throws them away after checking if they are in a list (which has O(n) complexity). And when you only care about finding one permutation that fits your predicate you drop the not fitting ones anyway.
And people have already correctly pointed out that !! is generally also very bad.
Thanks to everyone for your helpful answers and comments.
#lordQuick permuations used with filter is still terrible but that fromDigits code is a beauty, so I used it.
#k-a-buhr That's exactly what I did yesterday, also per others suggestion, I replaced all use of !! and ++. When I did the latter all memory problems disappeared. Wow! I mean I knew ++ is bad I just didn't realise how bad! We're talking orders of magnitude bad! 3M of RAM vs several GB. Also, valid point about integers. I will try that.
Oh, also a very important thing. I replaced recursive calls with until. This is the approach I would have followed in Mathematica (a NestWhile function to be exact), and I'm glad I found it in Haskell. It seemed to make things a bit faster too.
Anyway, the revised code, that solves my memory issues is here for anyone if interested.
{-compiled with "ghc -Rghc-timing -O2 checking_program_v3.hs"-}
import System.Environment
import Data.List
--A little help with triples
fstOfThree (a, _, _) = a
sndOfThree (_, b, _) = b
thrOfThree (_, _, c) = c
--And then some with quads
fstOfFour (a, _, _, _) = a
sndOfFour (_, b, _, _) = b
thrOfFour (_, _, c, _) = c
--This function is a single pass test for single digit factors
--It will be called as many times as needed by pryForSDFactors
trySingleDigitsFactors :: (Bool, Integer, [Integer]) -> (Bool, Integer, [Integer])
trySingleDigitsFactors (True, n, f) = (True, n, f)
trySingleDigitsFactors (b, n, []) = (b, n, [])
trySingleDigitsFactors (b, n, (f:fs))
| mod n f == 0 = (True, div n f, fs)
| otherwise = trySingleDigitsFactors (False, n, fs)
--This function will take a number and repeatedly divide by single digits till it gets to a single digit if possible
--Then it will return True
pryForSDFactors :: Integer -> Bool
pryForSDFactors n
| sndOfThree sdfTry < 10 = True
| fstOfThree sdfTry == True = pryForSDFactors $ sndOfThree sdfTry
| otherwise = False
where sdfTry = trySingleDigitsFactors (False, n, [7,5,3,2])
toDigits :: Integer -> [Integer]
toDigits n = map (\n -> read [n]) (show n)
fromDigits :: Integral a => [a] -> Integer
fromDigits = foldl shiftAndAdd 0
where shiftAndAdd acc d = 10 * acc + fromIntegral d
replaceElementAtPos :: a -> Int -> [a] -> [a]
replaceElementAtPos newElement pos [] = []
replaceElementAtPos newElement 0 (x:xs) = newElement:xs
replaceElementAtPos newElement pos (x:xs) = x : replaceElementAtPos newElement (pos-1) xs
checkPermutationsStep :: ([Integer],Int,[Int],[(Int,Integer)]) -> ([Integer],Int,[Int],[(Int,Integer)])
checkPermutationsStep (digits, index, rotationMap, registeredPositions)
| index == digitsLength - 1 = (digits, index-1, rotationMap, registeredPositions)
| not ((index, digitAtIndex) `elem` registeredPositions) = (digits, index+1, rotationMap, (index,digitAtIndex):registeredPositions)
| rotationAtIndex == 0 = (digits, index-1, restoredRotMap, restoredRegPositions)
| rotationAtIndex > 0 && (index, digitAtIndex) `elem` registeredPositions = (shiftLDigits, index, subtractRot, registeredPositions)
where digitsLength = length digits
digitAtIndex = head $ drop index digits
rotationAtIndex = head $ drop index rotationMap
--restoredRotMap = (fst splitRotMap) ++ [digitsLength - index] ++ (tail $ snd splitRotMap)
restoredRotMap = replaceElementAtPos (digitsLength - index) index rotationMap
--splitRotMap = splitAt index rotationMap
restoredRegPositions = filter (\pos -> fst pos < index) registeredPositions --clear everything below the parent index
shiftLDigits = (fst splitDigits) ++ (tail $ snd splitDigits) ++ [head $ snd splitDigits]
splitDigits = splitAt index digits
--subtractRot = (fst splitRotMap) ++ [(head $ snd splitRotMap) - 1] ++ (tail $ snd splitRotMap)
subtractRot = replaceElementAtPos (rotationDigitAtIndex - 1) index rotationMap
rotationDigitAtIndex = head $ drop index rotationMap
checkConditions :: ([Integer],Int,[Int],[(Int,Integer)]) -> Bool
checkConditions (digits, index, rotationMap, registeredPositions)
| (index == 0 && rotationAtIndex == 0) || ((index == (length digits) - 1) && pryForSDFactors (fromDigits digits)) = True
| otherwise = False
where rotationAtIndex = head $ drop index rotationMap
testPermsWithRep :: Integer -> Integer
testPermsWithRep n
| sndOfFour computationResult == 0 && (head . thrOfFour) computationResult == 0 = 0
| otherwise = (fromDigits . fstOfFour) computationResult
where computationResult = until checkConditions checkPermutationsStep (digitsOfn, 0 , [digitsLength, digitsLength -1 .. 1], [])
digitsOfn = toDigits n
digitsLength = length digitsOfn
main :: IO ()
main = do
args <- getArgs
let inputNumber = read (head args) :: Integer
let checkResult = testPermsWithRep inputNumber
print checkResult
Now, bear in mind that this code, as I've mentioned, checks for a condition of each generated permutation (single digit factors) on the spot, and moves on if False, but it's pretty easy to repurpose it for output list generation.
Sure it's now just inefficient in terms of big O complexity (scales terribly), and I was at first thinking of replacing lists with Data.Map because that's what I've learned so far (though not so comfortable with maps yet).
I've also read that there's a more efficient replacement for read since that's also called a lot for numbers-to-digits conversions.
# lordQuick I don't know about HashMaps or vectors yet but I'm still learning. Every little optimization will pay off in computation time because this is my first piece of "practical" code, not just Codeabbey credit
Cheers!
Here is a solution using a more efficient, insertion-based algorithm to compute unique permutations:
import Data.List
permutationsNub :: Eq a => [a] -> [[a]]
permutationsNub = foldr (concatMap . insert) [[]]
where insert y = foldr combine [[y]] . (zip <*> tail . tails)
where combine (x, xs) xss = (y : x : xs) :
if y == x then [] else map (x :) xss
headDef :: a -> [a] -> a
headDef x [] = x
headDef x (h : t) = h
fromDigits :: Integral a => [a] -> Integer
fromDigits = foldl1' ((+) . (10 *)) . map fromIntegral
toDigits :: Integer -> [Int]
toDigits = map (read . pure) . show
firstValidPermutation :: (Integer -> Bool) -> Integer -> Integer
firstValidPermutation p =
headDef 0 .
filter p .
map fromDigits .
permutationsNub .
toDigits
The basic idea is that, given the unique permutations of a list's tail, we can compute the unique permutations of the whole list by inserting its head into all of the tail's permutations, in every position that doesn't follow an occurrence of the head (to avoid creating duplicates). From my tests, permutationsNub seems to be faster than permutations from Data.List even when the input contains no repetitions. However, unlike that function, it consumes its input eagerly and thus cannot handle an infinite input. Exercise: Prove this algorithm's correctness.
to be continued

Why does my function not work with an infinite list?

I'm trying to learn haskell and implemented a function conseq that would return a list of consecutive elements of size n.
conseq :: Int -> [Int] -> [[Int]]
conseq n x
| n == length(x) = [x]
| n > length(x) = [x]
| otherwise = [take n x] ++ (conseq n (drop 1 x))
This works correctly.
> take 5 $ conseq 2 [1..10]
[[1,2],[2,3],[3,4],[4,5],[5,6]]
However, if I pass [1..] instead of [1..10], the program gets stuck in an infinite loop.
As I understood it, haskell has lazy evaluation so I should still be able to get the same result right? Is it length? Shouldn't the first two conditions evaluate to false as soon as the length becomes greater than n?
What did I misunderstand?
One of the main reasons why using length is not a good idea is because when it has to be evaluated on an infinite list, it will get stuck in an infinite loop.
The good news is however, we don't need length. It would also make the time complexity worse. We can work with two enumerators, one is n-1 places ahead of the other. If this enumerator reaches the end of the list, then we know that the first enumerator still has n-1 elements, and thus we can stop yielding values:
conseq :: Int -> [a] -> [[a]]
conseq n ys = go (drop (n-1) ys) ys
where go [] _ = []
go (_:as) ba#(~(_:bs)) = take n ba : go as bs
This gives us thus:
Prelude> conseq 3 [1 ..]
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8],[7,8,9],[8,9,10],[9,10,11],[10,11,12],[11,12,13],[12,13,14],[13,14,15],[14,15,16],[15,16,17],[16,17,18],[17,18,19],[18,19,20],[19,20,21],[20,21,22],[21,22,23],[22,23,24],[23,24,25],[24,25,26],[25,26,27],…
Prelude> conseq 3 [1 .. 4]
[[1,2,3],[2,3,4]]
The first thing your function does is calculate length(x), so it knows whether it should return [x], [x], or [take n x] ++ (conseq n (drop 1 x))
length counts the number of elements in the list - all the elements. If you ask for the length of an infinite list, it never finishes counting.

Using fold* to grow a list in Haskell

I'm trying to solve the following problem in Haskell: given an integer return the list of its digits. The constraint is I have to only use one of the fold* functions (* = {r,l,1,l1}).
Without such constraint, the code is simple:
list_digits :: Int -> [Int]
list_digits 0 = []
list_digits n = list_digits r ++ [n-10*r]
where
r = div n 10
But how do I use fold* to, essentially grow a list of digits from an empty list?
Thanks in advance.
Is this a homework assignment? It's pretty strange for the assignment to require you to use foldr, because this is a natural use for unfoldr, not foldr. unfoldr :: (b -> Maybe (a, b)) -> b -> [a] builds a list, whereas foldr :: (a -> b -> b) -> b -> [a] -> b consumes a list. An implementation of this function using foldr would be horribly contorted.
listDigits :: Int -> [Int]
listDigits = unfoldr digRem
where digRem x
| x <= 0 = Nothing
| otherwise = Just (x `mod` 10, x `div` 10)
In the language of imperative programming, this is basically a while loop. Each iteration of the loop appends x `mod` 10 to the output list and passes x `div` 10 to the next iteration. In, say, Python, this'd be written as
def list_digits(x):
output = []
while x > 0:
output.append(x % 10)
x = x // 10
return output
But unfoldr allows us to express the loop at a much higher level. unfoldr captures the pattern of "building a list one item at a time" and makes it explicit. You don't have to think through the sequential behaviour of the loop and realise that the list is being built one element at a time, as you do with the Python code; you just have to know what unfoldr does. Granted, programming with folds and unfolds takes a little getting used to, but it's worth it for the greater expressiveness.
If your assignment is marked by machine and it really does require you to type the word foldr into your program text, (you should ask your teacher why they did that and) you can play a sneaky trick with the following "id[]-as-foldr" function:
obfuscatedId = foldr (:) []
listDigits = obfuscatedId . unfoldr digRem
Though unfoldr is probably what the assignment meant, you can write this using foldr if you use foldr as a hylomorphism, that is, building up one list while it tears another down.
digits :: Int -> [Int]
digits n = snd $ foldr go (n, []) places where
places = replicate num_digits ()
num_digits | n > 0 = 1 + floor (logBase 10 $ fromIntegral n)
| otherwise = 0
go () (n, ds) = let (q,r) = n `quotRem` 10 in (q, r : ds)
Effectively, what we're doing here is using foldr as "map-with-state". We know ahead of time
how many digits we need to output (using log10) just not what those digits are, so we use
unit (()) values as stand-ins for those digits.
If your teacher's a stickler for just having a foldr at the top-level, you can get
away with making go partial:
digits' :: Int -> [Int]
digits' n = foldr go [n] places where
places = replicate num_digits ()
num_digits | n > 0 = floor (logBase 10 $ fromIntegral n)
| otherwise = 0
go () (n:ds) = let (q,r) = n `quotRem` 10 in (q:r:ds)
This has slightly different behaviour on non-positive numbers:
>>> digits 1234567890
[1,2,3,4,5,6,7,8,9,0]
>>> digits' 1234567890
[1,2,3,4,5,6,7,8,9,0]
>>> digits 0
[]
>>> digits' 0
[0]
>>> digits (negate 1234567890)
[]
>>> digits' (negate 1234567890)
[-1234567890]

How can this haskell rolling sum implementation be improved?

How can I improve the the following rolling sum implementation?
type Buffer = State BufferState (Maybe Double)
type BufferState = ( [Double] , Int, Int )
-- circular buffer
buff :: Double -> Buffer
buff newVal = do
( list, ptr, len) <- get
-- if the list is not full yet just accumulate the new value
if length list < len
then do
put ( newVal : list , ptr, len)
return Nothing
else do
let nptr = (ptr - 1) `mod` len
(as,(v:bs)) = splitAt ptr list
nlist = as ++ (newVal : bs)
put (nlist, nptr, len)
return $ Just v
-- create intial state for circular buffer
initBuff l = ( [] , l-1 , l)
-- use the circular buffer to calculate a rolling sum
rollSum :: Double -> State (Double,BufferState) (Maybe Double)
rollSum newVal = do
(acc,bState) <- get
let (lv , bState' ) = runState (buff newVal) bState
acc' = acc + newVal
-- subtract the old value if the circular buffer is full
case lv of
Just x -> put ( acc' - x , bState') >> (return $ Just (acc' - x))
Nothing -> put ( acc' , bState') >> return Nothing
test :: (Double,BufferState) -> [Double] -> [Maybe Double] -> [Maybe Double]
test state [] acc = acc
test state (x:xs) acc =
let (a,s) = runState (rollSum x) state
in test s xs (a:acc)
main :: IO()
main = print $ test (0,initBuff 3) [1,1,1,2,2,0] []
Buffer uses the State monad to implement a circular buffer. rollSum uses the State monad again to keep track of the rolling sum value and the state of the circular buffer.
How could I make this more elegant?
I'd like to implement other functions like rolling average or a difference, what could I do to make this easy?
Thanks!
EDIT
I forgot to mention I am using a circular buffer as I intend to use this code on-line and process updates as they arrive - hence the need to record state. Something like
newRollingSum = update rollingSum newValue
I haven't managed to decipher all of your code, but here is the plan I would take for solving this problem. First, an English description of the plan:
We need windows into the list of length n starting at each index.
Make windows of arbitrary length.
Truncate long windows to length n.
Drop the last n-1 of these, which will be too short.
For each window, add up the entries.
This was the first idea I had; for windows of length three it's an okay approach because step 2 is cheap on such a short list. For longer windows, you may want an alternate approach, which I will discuss below; but this approach has the benefit that it generalizes smoothly to functions other than sum. The code might look like this:
import Data.List
rollingSums n xs
= map sum -- add up the entries
. zipWith (flip const) (drop (n-1) xs) -- drop the last n-1
. map (take n) -- truncate long windows
. tails -- make arbitrarily long windows
$ xs
If you're familiar with the "equational reasoning" approach to optimization, you might spot a first place we can improve the performance of this function: by swapping the first map and zipWith, we can produce a function with the same behavior but with a map f . map g subterm, which can be replaced by map (f . g) to get slightly less allocation.
Unfortunately, for large n, this adds n numbers together in the inner loop; we would prefer to simply add the value at the "front" of the window and subtract the one at the "back". So we need to get trickier. Here's a new idea: we'll traverse the list twice in parallel, n positions apart. Then we'll use a simple function for getting the rolling sum (of unbounded window length) of prefixes of a list, namely, scanl (+), to convert this traversal into the actual sums we're interested in.
rollingSumsEfficient n xs = scanl (+) firstSum deltas where
firstSum = sum (take n xs)
deltas = zipWith (-) (drop n xs) xs -- front - back
There's one twist, which is that scanl never returns an empty list. So if it's important that you be able to handle short lists, you'll want another equation that checks for these. Don't use length, as that forces the entire input list into memory before starting the computation -- a potentially lethal performance mistake. Instead add a line like this above the previous definition:
rollingSumsEfficient n xs | null (drop (n-1) xs) = []
We can try these two out in ghci. You'll notice that they do not quite have the same behavior as yours:
*Main> rollingSums 3 [10^n | n <- [0..5]]
[111,1110,11100,111000]
*Main> rollingSumsEfficient 3 [10^n | n <- [0..5]]
[111,1110,11100,111000]
On the other hand, the implementations are considerably more concise and are fully lazy in the sense that they work on infinite lists:
*Main> take 5 . rollingSums 10 $ [1..]
[55,65,75,85,95]
*Main> take 5 . rollingSumsEfficient 10 $ [1..]
[55,65,75,85,95]
Efficient implementation for rolling sum in haskell-
rollingSums :: Num a => Int -> [a] -> Maybe [a]
rollingSums n xs | n <= 0 = Nothing
| otherwise = Just $ if length as == n then go (sum as) xs bs else []
where
(as, bs) = splitAt n xs
go s xs [] = [s]
go s xs (y:ys) = s : go (s + y - head xs) (tail xs) ys
Asuming that - sum((i+1)...(i+1+n)) = sum(i..(i+n)) - arr[i] + arr[i+n+1]

Dovetail iteration over infinite lists in Haskell

I want to iterate 2 (or 3) infinite lists and find the "smallest" pair that satisfies a condition, like so:
until pred [(a,b,c) | a<-as, b<-bs, c<-cs]
where pred (a,b,c) = a*a + b*b == c*c
as = [1..]
bs = [1..]
cs = [1..]
The above wouldn't get very far, as a == b == 1 throughout the run of the program.
Is there a nice way to dovetail the problem, e.g. build the infinite sequence [(1,1,1),(1,2,1),(2,1,1),(2,1,2),(2,2,1),(2,2,2),(2,2,3),(2,3,2),..] ?
Bonus: is it possible to generalize to n-tuples?
There's a monad for that, Omega.
Prelude> let as = each [1..]
Prelude> let x = liftA3 (,,) as as as
Prelude> let x' = mfilter (\(a,b,c) -> a*a + b*b == c*c) x
Prelude> take 10 $ runOmega x'
[(3,4,5),(4,3,5),(6,8,10),(8,6,10),(5,12,13),(12,5,13),(9,12,15),(12,9,15),(8,15,17),(15,8,17)]
Using it's applicative features, you can generalize to arbitrary tuples:
quadrupels = (,,,) <$> as <*> as <*> as <*> as -- or call it liftA4
But: this alone does not eliminate duplication, of course. It only gives you proper diagonalization. Maybe you could use monad comprehensions together with an approach like Thomas's, or just another mfilter pass (restricting to b /= c, in this case).
List comprehensions are great (and concise) ways to solve such problems. First, you know you want all combinations of (a,b,c) that might satisfy a^2 + b^2 = c^2 - a helpful observation is that (considering only positive numbers) it will always be the case that a <= c && b <= c.
To generate our list of candidates we can thus say c ranges from 1 to infinity while a and b range from one to c.
[(a,b,c) | c <- [1..], a <- [1..c], b <- [1..c]]
To get to the solution we just need to add your desired equation as a guard:
[(a,b,c) | c <- [1..], a <- [1..c], b <- [1..c], a*a+b*b == c*c]
This is inefficient, but the output is correct:
[(3,4,5),(4,3,5),(6,8,10),(8,6,10),(5,12,13),(12,5,13),(9,12,15)...
There are more principled methods than blind testing that can solve this problem.
{- It depends on what is "smallest". But here is a solution for a concept of "smallest" if tuples were compared first by their max. number and then by their total sum. (You can just copy and paste my whole answer into a file as I write the text in comments.)
We will need nub later. -}
import Data.List (nub)
{- Just for illustration: the easy case with 2-tuples. -}
-- all the two-tuples where 'snd' is 'n'
tuples n = [(i, n) | i <- [1..n]]
-- all the two-tuples where 'snd' is in '1..n'
tuplesUpTo n = concat [tuples i | i <- [1..n]]
{-
To get all results, you will need to insert the flip of each tuple into the stream. But let's do that later and generalize first.
Building tuples of arbitrary length is somewhat difficult, so we will work on lists. I call them 'kList's, if they have a length 'k'.
-}
-- just copied from the tuples case, only we need a base case for k=1 and
-- we can combine all results utilizing the list monad.
kLists 1 n = [[n]]
kLists k n = do
rest <- kLists (k-1) n
add <- [1..head rest]
return (add:rest)
-- same as above. all the klists with length k and max number of n
kListsUpTo k n = concat [kLists k i | i <- [1..n]]
-- we can do that unbounded as well, creating an infinite list.
kListsInf k = concat [kLists k i | i <- [1..]]
{-
The next step is rotating these lists around, because until now the largest number is always in the last place. So we just look at all rotations to get all the results. Using nub here is admittedly awkward, you can improve that. But without it, lists where all elements are the same are repeated k times.
-}
rotate n l = let (init, end) = splitAt n l
in end ++ init
rotations k l = nub [rotate i l | i <- [0..k-1]]
rotatedKListsInf k = concatMap (rotations k) $ kListsInf k
{- What remains is to convert these lists into tuples. This is a bit awkward, because every n-tuple is a separate type. But it's straightforward, of course. -}
kListToTuple2 [x,y] = (x,y)
kListToTuple3 [x,y,z] = (x,y,z)
kListToTuple4 [x,y,z,t] = (x,y,z,t)
kListToTuple5 [x,y,z,t,u] = (x,y,z,t,u)
kListToTuple6 [x,y,z,t,u,v] = (x,y,z,t,u,v)
{- Some tests:
*Main> take 30 . map kListToTuple2 $ rotatedKListsInf 2
[(1,1),(1,2),(2,1),(2,2),(1,3),(3,1),(2,3),(3,2),(3,3),(1,4),(4,1),(2,4),(4,2),(3,4),
(4,3),(4,4),(1,5),(5,1),(2,5),(5,2),(3,5),(5,3),(4,5),(5,4),(5,5),(1,6),(6,1),
(2,6), (6,2), (3,6)]
*Main> take 30 . map kListToTuple3 $ rotatedKListsInf 3
[(1,1,1),(1,1,2),(1,2,1),(2,1,1),(1,2,2),(2,2,1),(2,1,2),(2,2,2),(1,1,3),(1,3,1),
(3,1,1),(1,2,3),(2,3,1),(3,1,2),(2,2,3),(2,3,2),(3,2,2),(1,3,3),(3,3,1),(3,1,3),
(2,3,3),(3,3,2),(3,2,3),(3,3,3),(1,1,4),(1,4,1),(4,1,1),(1,2,4),(2,4,1),(4,1,2)]
Edit:
I realized there is a bug: Just rotating the ordered lists isn't enough of course. The solution must be somewhere along the lines of having
rest <- concat . map (rotations (k-1)) $ kLists (k-1) n
in kLists, but then some issues with repeated outputs arise. You can figure that out, I guess. ;-)
-}
It really depends on what you mean by "smallest", but I assume you want to find a tuple of numbers with respect to its maximal element - so (2,2) is less than (1,3) (while standard Haskell ordering is lexicographic).
There is package data-ordlist, which is aimed precisely at working with ordered lists. It's function mergeAll (and mergeAllBy) allows you to combine a 2-dimensional matrix ordered in each direction into an ordered list.
First let's create a desired comparing function on tuples:
import Data.List (find)
import Data.List.Ordered
compare2 :: (Ord a) => (a, a) -> (a, a) -> Ordering
compare2 x y = compare (max2 x, x) (max2 y, y)
where
max2 :: Ord a => (a, a) -> a
max2 (x, y) = max x y
Then using mergeAll we create a function that takes a comparator, a combining function (which must be monotonic in both arguments) and two sorted lists. It combines all possible elements from the two lists using the function and produces a result sorted list:
mergeWith :: (b -> b -> Ordering) -> (a -> a -> b) -> [a] -> [a] -> [b]
mergeWith cmp f xs ys = mergeAllBy cmp $ map (\x -> map (f x) xs) ys
With this function, it's very simple to produce tuples ordered according to their maximum:
incPairs :: [(Int,Int)]
incPairs = mergeWith compare2 (,) [1..] [1..]
Its first 10 elements are:
> take 10 incPairs
[(1,1),(1,2),(2,1),(2,2),(1,3),(2,3),(3,1),(3,2),(3,3),(1,4)]
and when we (for example) look for the first pair whose sum of squares is equal to 65:
find (\(x,y) -> x^2+y^2 == 65) incPairs
we get the correct result (4,7) (as opposed to (1,8) if lexicographic ordering were used).
This answer is for a more general problem for a unknown predicate. If the predicate is known, more efficient solutions are possible, like others have listed solutions based on knowledge that you don't need to iterate for all Ints for a given c.
When dealing with infinite lists, you need to perform breadth-first search for solution. The list comprehension only affords depth-first search, that is why you never arrive at a solution in your original code.
counters 0 xs = [[]]
counters n xs = concat $ foldr f [] gens where
gens = [[x:t | t <- counters (n-1) xs] | x <- xs]
f ys n = cat ys ([]:n)
cat (y:ys) (x:xs) = (y:x): cat ys xs
cat [] xs = xs
cat xs [] = [xs]
main = print $ take 10 $ filter p $ counters 3 [1..] where
p [a,b,c] = a*a + b*b == c*c
counters generates all possible counters for values from the specified range of digits, including a infinite range.
First, we obtain a list of generators of valid combinations of counters - for each permitted digit, combine it with all permitted combinations for counters of smaller size. This may result in a generator that produces a infinite number of combinations. So, we need to borrow from each generator evenly.
So gens is a list of generators. Think of this as a list of all counters starting with one digit: gens !! 0 is a list of all counters starting with 1, gens !! 1 is a list of all counters starting with 2, etc.
In order to borrow from each generator evenly, we could transpose the list of generators - that way we would get a list of first elements of the generators, followed by a list of second elements of the generators, etc.
Since the list of generators may be infinite, we cannot afford to transpose the list of generators, because we may never get to look at the second element of any generator (for a infinite number of digits we'd have a infinite number of generators). So, we enumerate the elements from the generators "diagonally" - take first element from the first generator; then take the second element from the first generator and the first from the second generator; then take the third element from the first generator, the second from the second, and the first element from the third generator, etc. This can be done by folding the list of generators with a function f, which zips together two lists - one list is the generator, the other is the already-zipped generators -, the beginning of one of them being offset by one step by adding []: to the head. This is almost zipWith (:) ys ([]:n) - the difference is that if n or ys is shorter than the other one, we don't drop the remainder of the other list. Note that folding with zipWith (:) ys n would be a transpose.
For this answer I will take "smallest" to refer to the sum of the numbers in the tuple.
To list all possible pairs in order, you can first list all of the pairs with a sum of 2, then all pairs with a sum of 3 and so on. In code
pairsWithSum n = [(i, n-i) | i <- [1..n-1]]
xs = concatMap pairsWithSum [2..]
Haskell doesn't have facilities for dealing with n-tuples without using Template Haskell, so to generalize this you will have to switch to lists.
ntuplesWithSum 1 s = [[s]]
ntuplesWithSum n s = concatMap (\i -> map (i:) (ntuplesWithSum (n-1) (s-i))) [1..s-n+1]
nums n = concatMap (ntuplesWithSum n) [n..]
Here's another solution, with probably another slightly different idea of "smallest". My order is just "all tuples with max element N come before all tuples with max element N+1". I wrote the versions for pairs and triples:
gen2_step :: Int -> [(Int, Int)]
gen2_step s = [(x, y) | x <- [1..s], y <- [1..s], (x == s || y == s)]
gen2 :: Int -> [(Int, Int)]
gen2 n = concatMap gen2_step [1..n]
gen2inf :: [(Int, Int)]
gen2inf = concatMap gen2_step [1..]
gen3_step :: Int -> [(Int, Int, Int)]
gen3_step s = [(x, y, z) | x <- [1..s], y <- [1..s], z <- [1..s], (x == s || y == s || z == s)]
gen3 :: Int -> [(Int, Int, Int)]
gen3 n = concatMap gen3_step [1..n]
gen3inf :: [(Int, Int, Int)]
gen3inf = concatMap gen3_step [1..]
You can't really generalize it to N-tuples, though as long as you stay homogeneous, you may be able to generalize it if you use arrays. But I don't want to tie my brain into that knot.
I think this is the simplest solution if "smallest" is defined as x+y+z because after you find your first solution in the space of Integral valued pythagorean triangles, your next solutions from the infinite list are bigger.
take 1 [(x,y,z) | y <- [1..], x <- [1..y], z <- [1..x], z*z + x*x == y*y]
-> [(4,5,3)]
It has the nice property that it returns each symmetrically unique solution only once. x and z are also infinite, because y is infinite.
This does not work, because the sequence for x never finishes, and thus you never get a value for y, not to mention z. The rightmost generator is the innermost loop.
take 1 [(z,y,x)|z <- [1..],y <- [1..],x <- [1..],x*x + y*y == z*z]
Sry, it's quite a while since I did haskell, so I'm going to describe it with words.
As I pointed out in my comment. It is not possible to find the smallest anything in an infinite list, since there could always be a smaller one.
What you can do is, have a stream based approach that takes the lists and returns a list with only 'valid' elements, i. e. where the condition is met. Lets call this function triangle
You can then compute the triangle list to some extent with take n (triangle ...) and from this n elements you can find the minium.

Resources