Haskell parallel search with early abort - haskell

I'd like to search through a list, testing each element for property X and then return when an element with property X is found.
This list is very large and would benefit from parallelism, but the cost of the spark is rather high relative to the compute time. parListChunk would be great, but then it must search through the entire list.
Is there some way I can write something like parListChunk but with early abort?
This is the naive search code:
hasPropertyX :: Object -> Bool
anyObjectHasPropertyX :: [Object] -> Bool
anyObjectHasPropertyX [] = False
anyObjectHasPropertyX l
| hasPropertyX (head l) == True = True
| otherwise = anyObjectHasPropertyX (tail l)
and this is my first attempt at parallelism:
anyObjectHasPropertyXPar [] = False
anyObjectHasPropertyXPar [a] = hasPropertyX a
anyObjectHasPropertyXPar (a:b:rest) = runEval $ do c1 <- rpar (force (hasPropertyX a))
c2 <- rpar (force (hasPropertyX b))
rseq c1
rseq c2
if (c1 == True) || (c2 == True) then return True else return (anyObjectHasPropertyXPar rest)
This does run slightly faster than the naive code (even with -N1, oddly enough), but not by much (it helps a little by extending the number of parallel computations). I believe it's not benefitting much because it has to spark one thread for each element in the list.
Is there an approach similar to parListChunk that will only spark n threads and that allows for an early abort?
Edit: I'm having problems thinking about this because it seems that I would need to monitor the return value of all the threads. If I omit the rseq's and have something like
if (c1 == True) || (c2 == True) then ...
Is the runtime environment intelligent enough to monitor both threads and continue when either one of them returns?

I don't think you're going to have much luck using Control.Parallel.Strategies. A key feature of this module is that it expresses "deterministic parallelism" such that the result of the program is unaffected by the parallel evaluation. The problem you've described is fundamentally non-deterministic because threads are racing to find the first match.
Update: I see now that you're only returning True if the element is found, so the desired behavior is technically deterministic. So, perhaps there is a way to trick the Strategies module into working. Still, the implementation below seems to meet the requirements.
Here's an implementation of a parallel find parFind that runs in the IO monad using Control.Concurrent primitives and seems to do what you want. Two MVars are used: runningV keeps count of how many threads are still running to allow the last thread standing to detect search failure; and resultV is used to return Just the result or Nothing when search failure is detected by that last thread. Note that it is unlikely to perform better than a single-threaded implementation unless the test (your hasPropertyX above) is substantially more work than the list traversal, unlike this toy example.
import Control.Monad
import Control.Concurrent
import Data.List
import System.Environment
-- Thin a list to every `n`th element starting with index `i`
thin :: Int -> Int -> [a] -> [a]
thin i n = unfoldr step . drop i
where step [] = Nothing
step (y:ys) = Just (y, drop (n-1) ys)
-- Use `n` parallel threads to find first element of `xs` satisfying `f`
parFind :: Int -> (a -> Bool) -> [a] -> IO (Maybe a)
parFind n f xs = do
resultV <- newEmptyMVar
runningV <- newMVar n
comparisonsV <- newMVar 0
threads <- forM [0..n-1] $ \i -> forkIO $ do
case find f (thin i n xs) of
Just x -> void (tryPutMVar resultV (Just x))
Nothing -> do m <- takeMVar runningV
if m == 1
then void (tryPutMVar resultV Nothing)
else putMVar runningV (m-1)
result <- readMVar resultV
mapM_ killThread threads
return result
myList :: [Int]
myList = [1..1000000000]
-- Use `n` threads to find first element equal to `y` in `myList`
run :: Int -> Int -> IO ()
run n y = do x <- parFind n (== y) myList
print x
-- e.g., stack ghc -- -O2 -threaded SearchList.hs
-- time ./SearchList +RTS -N4 -RTS 4 12345 # find 12345 using 4 threads -> 0.018s
-- time ./SearchList +RTS -N4 -RTS 4 -1 # full search w/o match -> 6.7s
main :: IO ()
main = do [n,y] <- getArgs
run (read n) (read y)
Also, note that this version runs the threads on interleaved sublists rather than dividing the main list up into consecutive chunks. I did it this way because (1) it was easier to demonstrate that "early" elements were found quickly; and (2) my huge list means that memory usage can explode if the whole list needs to be kept in memory.
In fact, this example is a bit of a performance time bomb -- its memory usage is nondeterministic and can probably explode if one thread falls way behind so that a substantial portion of the whole list needs to be kept in memory.
In a real world example where the whole list is probably being kept in memory and the property test is expensive, you may find that breaking the list into chunks is faster.

Related

Unable to force strictness in tail recursion

I am dealing with memory leaks in my Haskell program and I was able to isolate it to very basic laziness problem in dealing with arrays. I understand what's happening there. First element of the array is computed while the rest produce the delayed computations which consumes the heap. Unfortunately, I was unable to force strictness for the entire array computation.
I tried various combinations of seq, BangPatterns, ($!) without much success.
import Control.Monad
force x = x `seq` x
loop :: [Int] -> IO ()
loop x = do
when (head x `mod` 10000000 == 0) $ print x
let x' = force $ map (+1) x
loop x'
main = loop $ replicate 200 1
The profile with standard profiling options didn't give me any more information than I already know:
ghc -prof -fprof-auto-calls -rtsopts test.hs
./test +RTS -M300M -p -hc
This runs out of memory in the matter of a few seconds.
force x = x `seq` x
That's useless. seq doesn't mean "evaluate this thing now"; it means "evaluate the left thing before returning the result of evaluating the right thing". When they're the same, it does nothing, and your force is equivalent to just id. Try this instead:
import Control.DeepSeq
import Control.Monad
loop :: [Int] -> IO ()
loop x = do
when (head x `mod` 10000000 == 0) $ print x
let x' = map (+1) x
loop $!! x'
main = loop $ replicate 200 1
That evaluates x' and everything in it before loop x', which is useful.
Alternatively, Control.DeepSeq has a force function that is useful. Its semantics in this case are "evaluate all of the elements of your list before returning the result of evaluating any of it". If you used its force function in place of your own, your original code would otherwise work, since the first line of loop does evaluate the beginning of the list.

Benchmarking Filter and Partition

I was testing the performance of the partition function for lists and got some strange results, I think.
We have that partition p xs == (filter p xs, filter (not . p) xs) but we chose the first implementation because it only performs a single traversal over the list. Yet, the results I got say that it maybe be better to use the implementation that uses two traversals.
Here is the minimal code that shows what I'm seeing
import Criterion.Main
import System.Random
import Data.List (partition)
mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)
randList :: RandomGen g => g -> Integer -> [Integer]
randList gen 0 = []
randList gen n = x:xs
where
(x, gen') = random gen
xs = randList gen' (n - 1)
main = do
gen <- getStdGen
let arg10000000 = randList gen 10000000
defaultMain [
bgroup "filters -- split list in half " [
bench "partition100" $ nf (partition (>= 50)) arg10000000
, bench "mypartition100" $ nf (mypartition (>= 50)) arg10000000
]
]
I ran the tests both with -O and without it and both times I get that the double traversals is better.
I am using ghc-7.10.3 with criterion-1.1.1.0
My questions are:
Is this expected?
Am I using Criterion correctly? I know that laziness can be tricky and (filter p xs, filter (not . p) xs) will only do two traversals if both elements of the tuple are used.
Does this has to do something with the way lists are handled in Haskell?
Thanks a lot!
There is no black or white answer to the question. To dissect the problem consider the following code:
import Control.DeepSeq
import Data.List (partition)
import System.Environment (getArgs)
mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)
main :: IO ()
main = do
let cnt = 10000000
xs = take cnt $ concat $ repeat [1 .. 100 :: Int]
args <- getArgs
putStrLn $ unwords $ "Args:" : args
case args of
[percent, fun]
-> let p = (read percent >=)
in case fun of
"partition" -> print $ rnf $ partition p xs
"mypartition" -> print $ rnf $ mypartition p xs
"partition-ds" -> deepseq xs $ print $ rnf $ partition p xs
"mypartition-ds" -> deepseq xs $ print $ rnf $ mypartition p xs
_ -> err
_ -> err
where
err = putStrLn "Sorry, I do not understand."
I do not use Criterion to have a better control about the order of evaluation. To get timings, I use the +RTS -s runtime option. The different test case are executed using different command line options. The first command line option defines for which percentage of the data the predicate holds. The second command line option chooses between different tests.
The tests distinguish two cases:
The data is generated lazily (2nd argument partition or mypartition).
The data is already fully evaluated in memory (2nd argument partition-ds or mypartition-ds).
The result of the partitioning is always evaluated from left to right, i.e. starting with the list that contains all the elements for which the predicate holds.
In case 1 partition has the advantage that elements of the first resulting list get discarded before all elements of the input list were even produced. Case 1 is especially good, if the predicate matches many elements, i.e. the first command line argument is large.
In case 2, partition cannot play out this advantage, since all elements are already in memory.
For mypartition, in any case all elements are held in memory after the first resulting list is evaluated, because they are needed again to compute the second resulting list. Therefore there is not much of a difference between the two cases.
It seems, the more memory is used, the harder garbage collection gets. Therefore partition is well suited, if the predicate matches many elements and the lazy variant is used.
Conversely, if the predicate does not match many elements or all elements are already in memory, mypartition performs better, since its recursion does not deal with pairs in contrast to partition.
The Stackoverflow question “Irrefutable pattern does not leak memory in recursion, but why?” might give some more insights about the handling of pairs in the recursion of partition.

Evaluate a List until a certain time has passed [duplicate]

I want to write a function that takes a time limit (in seconds) and a list, and computes as many elements of the list as possible within the time limit.
My first attempt was to first write the following function, which times a pure computation and returns the time elapsed along with the result:
import Control.DeepSeq
import System.CPUTime
type Time = Double
timed :: (NFData a) => a -> IO (a, Time)
timed x = do t1 <- getCPUTime
r <- return $!! x
t2 <- getCPUTime
let diff = fromIntegral (t2 - t1) / 10^12
return (r, diff)
I can then define the function I want in terms of this:
timeLimited :: (NFData a) => Time -> [a] -> IO [a]
timeLimited remaining [] = return []
timeLimited remaining (x:xs) = if remaining < 0
then return []
else do
(y,t) <- timed x
ys <- timeLimited (remaining - t) xs
return (y:ys)
This isn't quite right though. Even ignoring timing errors and floating point errors, this approach never stops the computation of an element of the list once it has started, which means that it can (and in fact, normally will) overrun its time limit.
If instead I had a function that could short-circuit evaluation if it had taken too long:
timeOut :: Time -> a -> IO (Maybe (a,t))
timeOut = undefined
then I could write the function that I really want:
timeLimited' :: Time -> [a] -> IO [a]
timeLimited' remaining [] = return []
timeLimited' remaining (x:xs) = do
result <- timeOut remaining x
case result of
Nothing -> return []
Just (y,t) -> do
ys <- timeLimited' (remaining - t) xs
return (y:ys)
My questions are:
How do I write timeOut?
Is there a better way to write the function timeLimited, for example, one that doesn't suffer from accumulated floating point error from adding up time differences multiple times?
Here's an example I was able to cook up using some of the suggestions above. I've not done huge amounts of testing to ensure work is cut off exactly when the timer runs out, but based on the docs for timeout, this should work for all things not using FFI.
import Control.Concurrent.STM
import Control.DeepSeq
import System.Timeout
type Time = Int
-- | Compute as many items of a list in given timeframe (microseconds)
-- This is done by running a function that computes (with `force`)
-- list items and pushed them onto a `TVar [a]`. When the requested time
-- expires, ghc will terminate the execution of `forceIntoTVar`, and we'll
-- return what has been pushed onto the tvar.
timeLimited :: (NFData a) => Time -> [a] -> IO [a]
timeLimited t xs = do
v <- newTVarIO []
_ <- timeout t (forceIntoTVar xs v)
readTVarIO v
-- | Force computed values into given tvar
forceIntoTVar :: (NFData a) => [a] -> TVar [a] -> IO [()]
forceIntoTVar xs v = mapM (atomically . modifyTVar v . forceCons) xs
-- | Returns function that does actual computation and cons' to tvar value
forceCons :: (NFData a) => a -> [a] -> [a]
forceCons x = (force x:)
Now let's try it on something costly:
main = do
xs <- timeLimited 100000 expensiveThing -- run for 100 milliseconds
print $ length $ xs -- how many did we get?
-- | Some high-cost computation
expensiveThing :: [Integer]
expensiveThing = sieve [2..]
where
sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]
Compiled and run with time, it seems to work (obviously there is some overhead outside the timed portion, but I'm at roughly 100ms:
$ time ./timeLimited
1234
./timeLimited 0.10s user 0.01s system 97% cpu 0.112 total
Also, something to note about this approach; since I'm enclosing the entire operation of running the computations and pushing them onto the tvar inside one call to timeout, some time here is likely lost in creating the return structure, though I'm assuming (if your computations are costly) it won't account or much of your overall time.
Update
Now that I've had some time to think about it, due to Haskell's laziness, I'm not 100% positive the note above (about time-spent creating the return structure) is correct; either way, let me know if this is not precise enough for what you are trying to accomplish.
You can implement timeOut with the type you gave using timeout and evaluate. It looks something like this (I've omitted the part that computes how much time is left -- use getCurrentTime or similar for that):
timeoutPure :: Int -> a -> IO (Maybe a)
timeoutPure t a = timeout t (evaluate a)
If you want more forcing than just weak-head normal form, you can call this with an already-seq'd argument, e.g. timeoutPure (deepseq v) instead of timeoutPure v.
I would use two threads together with TVars and raise an exception (that causes every ongoing transaction to be rolled back) in the computation thread when the timeout has been reached:
forceIntoTVar :: (NFData a) => [a] -> TVar [a] -> IO [()]
forceIntoTVar xs v = mapM (atomically . modifyTVar v . forceCons) xs
-- | Returns function that does actual computation and cons' to tvar value
forceCons :: (NFData a) => a -> [a] -> [a]
forceCons x = (force x:)
main = do
v <- newTVarIO []
tID <- forkIO $ forceIntoTVar args v
threadDelay 200
killThread tID
readTVarIO v
In this example you (may) need to adjust forceIntoTVar a bit so that e.g. the list nodes are NOT computet inside the atomic transaction but first computed and then a atomic transaction is started to cons them onto the list.
In any case, when the exception is raised the ongoing transaction is rolled back or the ongoing computation is stopped before the result is consed to the list and that is what you want.
What you need to consider is that when the individual computations to prepare a node run with high frequency then this example is probably very costly compared to not using STM.

Compute as much of a list as possible in a fixed time

I want to write a function that takes a time limit (in seconds) and a list, and computes as many elements of the list as possible within the time limit.
My first attempt was to first write the following function, which times a pure computation and returns the time elapsed along with the result:
import Control.DeepSeq
import System.CPUTime
type Time = Double
timed :: (NFData a) => a -> IO (a, Time)
timed x = do t1 <- getCPUTime
r <- return $!! x
t2 <- getCPUTime
let diff = fromIntegral (t2 - t1) / 10^12
return (r, diff)
I can then define the function I want in terms of this:
timeLimited :: (NFData a) => Time -> [a] -> IO [a]
timeLimited remaining [] = return []
timeLimited remaining (x:xs) = if remaining < 0
then return []
else do
(y,t) <- timed x
ys <- timeLimited (remaining - t) xs
return (y:ys)
This isn't quite right though. Even ignoring timing errors and floating point errors, this approach never stops the computation of an element of the list once it has started, which means that it can (and in fact, normally will) overrun its time limit.
If instead I had a function that could short-circuit evaluation if it had taken too long:
timeOut :: Time -> a -> IO (Maybe (a,t))
timeOut = undefined
then I could write the function that I really want:
timeLimited' :: Time -> [a] -> IO [a]
timeLimited' remaining [] = return []
timeLimited' remaining (x:xs) = do
result <- timeOut remaining x
case result of
Nothing -> return []
Just (y,t) -> do
ys <- timeLimited' (remaining - t) xs
return (y:ys)
My questions are:
How do I write timeOut?
Is there a better way to write the function timeLimited, for example, one that doesn't suffer from accumulated floating point error from adding up time differences multiple times?
Here's an example I was able to cook up using some of the suggestions above. I've not done huge amounts of testing to ensure work is cut off exactly when the timer runs out, but based on the docs for timeout, this should work for all things not using FFI.
import Control.Concurrent.STM
import Control.DeepSeq
import System.Timeout
type Time = Int
-- | Compute as many items of a list in given timeframe (microseconds)
-- This is done by running a function that computes (with `force`)
-- list items and pushed them onto a `TVar [a]`. When the requested time
-- expires, ghc will terminate the execution of `forceIntoTVar`, and we'll
-- return what has been pushed onto the tvar.
timeLimited :: (NFData a) => Time -> [a] -> IO [a]
timeLimited t xs = do
v <- newTVarIO []
_ <- timeout t (forceIntoTVar xs v)
readTVarIO v
-- | Force computed values into given tvar
forceIntoTVar :: (NFData a) => [a] -> TVar [a] -> IO [()]
forceIntoTVar xs v = mapM (atomically . modifyTVar v . forceCons) xs
-- | Returns function that does actual computation and cons' to tvar value
forceCons :: (NFData a) => a -> [a] -> [a]
forceCons x = (force x:)
Now let's try it on something costly:
main = do
xs <- timeLimited 100000 expensiveThing -- run for 100 milliseconds
print $ length $ xs -- how many did we get?
-- | Some high-cost computation
expensiveThing :: [Integer]
expensiveThing = sieve [2..]
where
sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]
Compiled and run with time, it seems to work (obviously there is some overhead outside the timed portion, but I'm at roughly 100ms:
$ time ./timeLimited
1234
./timeLimited 0.10s user 0.01s system 97% cpu 0.112 total
Also, something to note about this approach; since I'm enclosing the entire operation of running the computations and pushing them onto the tvar inside one call to timeout, some time here is likely lost in creating the return structure, though I'm assuming (if your computations are costly) it won't account or much of your overall time.
Update
Now that I've had some time to think about it, due to Haskell's laziness, I'm not 100% positive the note above (about time-spent creating the return structure) is correct; either way, let me know if this is not precise enough for what you are trying to accomplish.
You can implement timeOut with the type you gave using timeout and evaluate. It looks something like this (I've omitted the part that computes how much time is left -- use getCurrentTime or similar for that):
timeoutPure :: Int -> a -> IO (Maybe a)
timeoutPure t a = timeout t (evaluate a)
If you want more forcing than just weak-head normal form, you can call this with an already-seq'd argument, e.g. timeoutPure (deepseq v) instead of timeoutPure v.
I would use two threads together with TVars and raise an exception (that causes every ongoing transaction to be rolled back) in the computation thread when the timeout has been reached:
forceIntoTVar :: (NFData a) => [a] -> TVar [a] -> IO [()]
forceIntoTVar xs v = mapM (atomically . modifyTVar v . forceCons) xs
-- | Returns function that does actual computation and cons' to tvar value
forceCons :: (NFData a) => a -> [a] -> [a]
forceCons x = (force x:)
main = do
v <- newTVarIO []
tID <- forkIO $ forceIntoTVar args v
threadDelay 200
killThread tID
readTVarIO v
In this example you (may) need to adjust forceIntoTVar a bit so that e.g. the list nodes are NOT computet inside the atomic transaction but first computed and then a atomic transaction is started to cons them onto the list.
In any case, when the exception is raised the ongoing transaction is rolled back or the ongoing computation is stopped before the result is consed to the list and that is what you want.
What you need to consider is that when the individual computations to prepare a node run with high frequency then this example is probably very costly compared to not using STM.

Project Euler 23: insight on this stackoverflow-ing program needed

Hi haskell fellows. I'm currently working on the 23rd problem of Project Euler. Where I'm at atm is that my code seems right to me - not in the "good algorithm" meaning, but in the "should work" meaning - but produces a Stack memory overflow.
I do know that my algorithm isn't perfect (in particular I could certainly avoid computing such a big intermediate result at each recursion step in my worker function).
Though, being in the process of learning Haskell, I'd like to understand why this code fails so miserably, in order to avoid this kind of mistakes next time.
Any insight on why this program is wrong will be appreciated.
import qualified Data.List as Set ((\\))
main = print $ sum $ worker abundants [1..28123]
-- Limited list of abundant numbers
abundants :: [Int]
abundants = filter (\x -> (sum (divisors x)) - x > x) [1..28123]
-- Given a positive number, returns its divisors unordered.
divisors :: Int -> [Int]
divisors x | x > 0 = [1..squareRoot x] >>=
(\y -> if mod x y == 0
then let d = div x y in
if y == d
then [y]
else [y, d]
else [])
| otherwise = []
worker :: [Int] -> [Int] -> [Int]
worker (a:[]) prev = prev Set.\\ [a + a]
worker (a:as) prev = worker as (prev Set.\\ (map ((+) a) (a:as)))
-- http://www.haskell.org/haskellwiki/Generic_number_type#squareRoot
(^!) :: Num a => a -> Int -> a
(^!) x n = x^n
squareRoot :: Int -> Int
squareRoot 0 = 0
squareRoot 1 = 1
squareRoot n =
let twopows = iterate (^!2) 2
(lowerRoot, lowerN) =
last $ takeWhile ((n>=) . snd) $ zip (1:twopows) twopows
newtonStep x = div (x + div n x) 2
iters = iterate newtonStep (squareRoot (div n lowerN) * lowerRoot)
isRoot r = r^!2 <= n && n < (r+1)^!2
in head $ dropWhile (not . isRoot) iters
Edit: the exact error is Stack space overflow: current size 8388608 bytes.. Increasing the stack memory limit through +RTS -K... doesn't solve the problem.
Edit2: about the sqrt thing, I just copy pasted it from the link in comments. To avoid having to cast Integer to Doubles and face the rounding problems etc...
In the future, it's polite to attempt a bit of minimalization on your own. For example, with a bit of playing, I was able to discover that the following program also stack-overflows (with an 8M stack):
main = print (worker [1..1000] [1..1000])
...which really nails down just what function is screwing you over. Let's take a look at worker:
worker (a:[]) prev = prev Set.\\ [a + a]
worker (a:as) prev = worker as (prev Set.\\ (map ((+) a) (a:as)))
Even on my first read, this function was red-flagged in my mind, because it's tail-recursive. Tail recursion in Haskell is generally not such a great idea as it is in other languages; guarded recursion (where you produce at least one constructor before recursing, or recurse some small number of times before producing a constructor) is generally better for lazy evaluation. And in fact, here, what's happening is that each recursive call to worker is building a deeper- and deeper-ly nested thunk in the prev argument. When the time comes to finally return prev, we have to go very deeply into a long chain of Set.\\ calls to work out just what it was we finally have.
This problem is obfuscated slightly by the fact that the obvious strictness annotation doesn't help. Let's massage worker until it works. The first observation is that the first clause is completely subsumed by the second one. This is stylistic; it shouldn't affect the behavior (except on empty lists).
worker [] prev = prev
worker (a:as) prev = worker as (prev Set.\\ map (a+) (a:as))
Now, the obvious strictness annotation:
worker [] prev = prev
worker (a:as) prev = prev `seq` worker as (prev Set.\\ map (a+) (a:as))
I was surprised to discover that this still stack overflows! The sneaky thing is that seq on lists only evaluates far enough to learn whether the list matches either [] or _:_. The following does not stack overflow:
import Control.DeepSeq
worker [] prev = prev
worker (a:as) prev = prev `deepseq` worker as (prev Set.\\ map (a+) (a:as))
I didn't plug this final version back into the original code, but it at least works with the minimized main above. By the way, you might like the following implementation idea, which also stack overflows:
import Control.Monad
worker as bs = bs Set.\\ liftM2 (+) as as
but which can be fixed by using Data.Set instead of Data.List, and no strictness annotations:
import Control.Monad
import Data.Set as Set
worker as bs = toList (fromList bs Set.\\ fromList (liftM2 (+) as as))
As Daniel Wagner correctly said, the problem is that
worker (a:as) prev = worker as (prev Set.\\ (map ((+) a) (a:as)))
builds a badly nested thunk. You can avoid that and get somewhat better performance than with deepseq by exploiting the fact that both arguments to worker are sorted in this application. Thus you can get incremental output by noting that at any step everything in prev smaller than 2*a cannot be the sum of two abundant numbers, so
worker (a:as) prev = small ++ worker as (large Set.\\ map (+ a) (a:as))
where
(small,large) = span (< a+a) prev
does better. However, it's still bad because (\\) cannot use the sortedness of the two lists. If you replace it with
minus xxs#(x:xs) yys#(y:ys)
= case compare x y of
LT -> x : minus xs yys
EQ -> minus xs ys
GT -> minus xxs ys
minus xs _ = xs -- originally forgot the case for one empty list
(or use the data-ordlist package's version), calculating the set-difference is O(length) instead of O(length^2).
Ok, I loaded it up and gave it a shot. Daniel Wagner's advice is pretty good, probably better than mine. The problem is indeed with the worker function, but I was going to suggest using Data.MemoCombinators to memoize your function instead.
Also, your divisors algorithm is kind of silly. There's a much better way to do that. It's kind of mathy and would require a lot of TeX, so here's a link to a math.stackexchange page about how to do that. The one I was talking about, was the accepted answer, though someone else gives a recursive solution that I think would run faster. (It doesn't require prime factorization.)
https://math.stackexchange.com/questions/22721/is-there-a-formula-to-calculate-the-sum-of-all-proper-divisors-of-a-number

Resources