Haskell ways to the 3n+1 challenge - haskell

Here is a simple programming problem from SPOJ: http://www.spoj.com/problems/PROBTRES/.
Basically, you are asked to output the biggest Collatz cycle for numbers between i and j. (Collatz cycle of a number $n$ is the number of steps to eventually get from $n$ to 1.)
I have been looking for a Haskell way to solve the problem with comparative performance than that of Java or C++ (so as to fits in the allowed run-time limit). Although a simple Java solution that memoizes the cycle length of any already computed cycles will work. I haven't been successful at applying the idea to obtain a Haskell solution.
I have tried the Data.Function.Memoize, as well as home-brewed log time memoization technique using the idea from this post: Memoization in Haskell?. Unfortunately, memoization actually makes the computation of cycle(n) even slower. I believe the slow down comes from the overhead of haskell way. (I tried running with the compiled binary code, instead of interpreting.)
I also suspect that simply iterating numbers from i to j can be costly ($i,j\le10^6$). So I even tried precompute everything for the range query, using idea from http://blog.openendings.net/2013/10/range-trees-and-profiling-in-haskell.html. However, this still gives "Time Limit Exceeding" error.
Can you help to inform a neat competitive Haskell program for this?
Thanks!

>>> using the approach bellow, I could submit an accepted answer to SPOJ. You may check the entire code from here.
The problem has bounds 0 < n < 1,000,000. Pre-calculate all of them and store them inside an array; then freeze the array. The array can be used as its own cache / memoization space.
The problem would then reduce to a range query problem over an array, which can be done very efficiently using trees.
With the code bellow I can get Collatz of 1..1,000,000 in a fraction of a second:
$ time echo 1000000 | ./collatz
525
real 0m0.177s
user 0m0.173s
sys 0m0.003s
Note that collatz function below, uses mutable STUArray internally, but itself is a pure function:
import Control.Monad.ST (ST)
import Control.Monad (mapM_)
import Control.Applicative ((<$>))
import Data.Array.Unboxed (UArray, elems)
import Data.Array.ST (STUArray, readArray, writeArray, runSTUArray, newArray)
collatz :: Int -> UArray Int Int
collatz size = out
where
next i = if odd i then 3 * i + 1 else i `div` 2
loop :: STUArray s Int Int -> Int -> ST s Int
loop arr k
| size < k = succ <$> loop arr (next k)
| otherwise = do
out <- readArray arr k
if out /= 0 then return out
else do
out <- succ <$> loop arr (next k)
writeArray arr k out
return out
out = runSTUArray $ do
arr <- newArray (1, size) 0
writeArray arr 1 1
mapM_ (loop arr) [2..size]
return arr
main = do
size <- read <$> getLine
print . maximum . elems $ collatz size
In order to perform range queries on this array, you may build a balanced tree as simple as below:
type Range = (Int, Int)
data Tree = Leaf Int | Node Tree Tree Range Int
build_tree :: Int -> Tree
build_tree size = loop 1 cnt
where
ctz = collatz size
cnt = head . dropWhile (< size) $ iterate (*2) 1
(Leaf a) +: (Leaf b) = max a b
(Node _ _ _ a) +: (Node _ _ _ b) = max a b
loop lo hi
| lo == hi = Leaf $ if size < lo then minBound else ctz ! lo
| otherwise = Node left right (lo, hi) (left +: right)
where
i = (lo + hi) `div` 2
left = loop lo i
right = loop (i + 1) hi
query_tree :: Tree -> Int -> Int -> Int
query_tree (Leaf x) _ _ = x
query_tree (Node l r (lo, hi) x) i j
| i <= lo && hi <= j = x
| mid < i = query_tree r i j
| j < 1 + mid = query_tree l i j
| otherwise = max (query_tree l i j) (query_tree r i j)
where mid = (lo + hi) `div` 2

Here is the same as in the other answer, but with an immutable recursively defined array (and it also leaks slightly (can someone say why?) and so two times slower):
import Data.Array
upper = 10^6
step :: Integer -> Int
step i = 1 + colAt (if odd i then 3 * i + 1 else i `div` 2)
colAt :: Integer -> Int
colAt i | i > upper = step i
colAt i = col!i
col :: Array Integer Int
col = array (1, upper) $ (1, 1) : [(i, step i) | i <- [2..upper]]
main = print $ maximum $ elems col

Related

Haskell Space Leak

all.
While trying to solve some programming quiz:
https://www.hackerrank.com/challenges/missing-numbers
, I came across with space leak.
Main function is difference, which implements multi-set difference.
I've found out that List ':' and Triples (,,) kept on heaps
with -hT option profiling. However, only big lists are difference's
two arguments, and it shrinks as difference keeps on tail recursion.
But the memory consumed by lists keeps increasing as program runs.
Triples is ephemeral array structure, used for bookkeeping the count of multiset's each element. But the memory consumed by triples also
keeps increasing, and I cannot find out why.
Though I've browsed similar 'space leak' questions in stackoverflow,
I couldn't grasp the idea. Surely I have much to study.
I appreciate any comments. Thank you.
p.s) executable is compiled with -O2 switch.
$ ./difference -hT < input04.txt
Stack space overflow: current size 8388608 bytes.
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
.
import Data.List
import Data.Array
-- array (non-zero-count, start-offset, array_data)
array_size=101
myindex :: Int -> Int -> Int
myindex key offset
| key >= offset = key - offset
| otherwise = key - offset + array_size
mylookup x (_,offset,arr) = arr ! idx
where idx = myindex x offset
addOrReplace :: Int -> Int -> (Int, Int, Array Int (Int,Int)) -> (Int, Int, Array Int (Int,Int))
addOrReplace key value (count,offset,arr) = (count', offset, arr // [(idx,(key,value))])
where idx = myindex key offset
(_,prev_value) = arr ! idx
count' = case (prev_value, value) of
(0,0) -> count
(0,_) -> count + 1
(_,0) -> count - 1
otherwise -> count
difference :: (Int,Int,Array Int (Int,Int)) -> [Int] -> [Int] -> [Int]
difference (count,offset,arr) [] []
| count == 0 = []
| otherwise = [ k | x <- [0..array_size-1], let (k,v) = (arr ! x), v /= 0]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = addOrReplace x (v + 1) m
difference m [] (y:ys) = difference new_m [] ys
where (_,v) = mylookup y m
new_m = if v == 0
then m
else addOrReplace y (v - 1) m
main = do
n <- readLn :: IO Int
pp <- getLine
m <- readLn :: IO Int
qq <- getLine
let p = map (read :: String->Int) . words $ pp
q = map (read :: String->Int) . words $ qq
startArray = (0,head q, array (0,100) [(i,(0,0)) | i <- [0..100]] )
putStrLn . unwords . map show . sort $ difference startArray q p
[EDIT]
I seq'ed value and Array thanks to Carl's advice.
I attach heap diagram.
[original heap profiling]
[]1
[after seq'ing value v]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
[after seq'ing value v and Array]
difference m (x:xs) y = new_m `seq` difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
I see three main problems with this code.
First (and not the cause of the memory use, but definitely the cause of generally poor performance) Array is horrible for this use case. O(1) lookups are useless when updates are O(n).
Speaking of, the values being stored in the Array aren't forced while difference is looping over its first input. They are thunks containing pointers to an unevaluated lookup in the previous version of the array. You can ensure that the value is evaluated at the same time the array is updated, in a variety of ways. When difference loops over its second input, it does this accidentally, in fact, by comparing the value against 0.
Third, difference doesn't even force the evaluation of the new arrays being created while traversing its first argument. Nothing requires the old array to be evaluated during that portion of the loop.
Both of those latter issues need to be resolved to fix the space leak. The first issue doesn't cause a space leak, just much higher overheads than needed.

Short-circuiting a function over a lower triangular(ish) array in Haskell: speed leads to ugly code

I've got a function, in my minimum example called maybeProduceValue i j, which is only valid when i > j. Note that in my actual code, the js are not uniform and so the data only resembles a triangular matrix, I don't know what the mathematical name for this is.
I'd like my code, which loops over i and j and returns essentially (where js is sorted)
[maximum [f i j | j <- js, j < i] | i <- [0..iMax]]
to not check any more j's once one has failed. In C-like languages, this is simple as
if (j >= i) {break;}
and I'm trying to recreate this behaviour in Haskell. I've got two implementations below:
one which tries to take advantage of laziness by using takeWhile to only inspect at most one value (per i) which fails the test and returns Nothing;
one which remembers the number of js which worked for the previous i and so, for i+1, it doesn't bother doing any safety checks until it exceeds this number.
This latter function is more than twice as fast by my benchmarks but it really is a mess - I'm trying to convince people that Haskell is more concise and safe while still reasonably performant and here is some fast code which is dense, cluttered and does a bunch of unsafe operations.
Is there a solution, perhaps using Cont, Error or Exception, that can achieve my desired behaviour?
n.b. I've tried using Traversable.mapAccumL and Vector.unfoldrN instead of State and they end up being about the same speed and clarity. It's still a very overcomplicated way of solving this problem.
import Criterion.Config
import Criterion.Main
import Control.DeepSeq
import Control.Monad.State
import Data.Maybe
import qualified Data.Traversable as T
import qualified Data.Vector as V
main = deepseq inputs $ defaultMainWith (defaultConfig{cfgSamples = ljust 10}) (return ()) [
bcompare [
bench "whileJust" $ nf whileJust js,
bench "memoised" $ nf memoisedSection js
]]
iMax = 5000
jMax = 10000
-- any sorted vector
js :: V.Vector Int
js = V.enumFromN 0 jMax
maybeProduceValue :: Int -> Int -> Maybe Float
maybeProduceValue i j | j < i = Just (fromIntegral (i+j))
| otherwise = Nothing
unsafeProduceValue :: Int -> Int -> Float
-- unsafeProduceValue i j | j >= i = error "you fool!"
unsafeProduceValue i j = fromIntegral (i+j)
whileJust, memoisedSection
:: V.Vector Int -> V.Vector Float
-- mean: 389ms
-- short circuits properly
whileJust inputs' = V.generate iMax $ \i ->
safeMax . V.map fromJust . V.takeWhile isJust $ V.map (maybeProduceValue i) inputs'
where safeMax v = if V.null v then 0 else V.maximum v
-- mean: 116ms
-- remembers the (monotonically increasing) length of the section of
-- the vector that is safe. I have tested that this doesn't violate the condition that j < i
memoisedSection inputs' = flip evalState 0 $ V.generateM iMax $ \i -> do
validSection <- state $ \oldIx ->
let newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
in (V.unsafeTake newIx inputs', newIx)
return $ V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
Here's a simple way of solving the problem with Applicatives, provided that you don't need to keep the rest of the list once you run into an issue:
import Control.Applicative
memoizeSections :: Ord t => [(t, t)] -> Maybe [t]
memoizeSections [] = Just []
memoizeSections ((x, y):xs) = (:) <$> maybeProduceValue x y <*> memoizeSections xs
This is equivalent to:
import Data.Traversable
memoizeSections :: Ord t => [(t, t)] -> Maybe [t]
memoizeSections = flip traverse (uncurry maybeProduceValue)
and will return Nothing on the first occurrence of failure. Note that I don't know how fast this is, but it's certainly concise, and arguably pretty clear (particularly the first example).
Some minor comments:
-- any sorted vector
js :: V.Vector Int
js = V.enumFromN 0 jMax
If you have a vector of Ints (or Floats, etc), you want to use Data.Vector.Unboxed.
maybeProduceValue :: Int -> Int -> Maybe Float
maybeProduceValue i j | j < i = Just (fromIntegral (i+j))
| otherwise = Nothing
Since Just is lazy in its only field, this will create a thunk for the computation fromIntegral (i+j). You almost always want to apply Just like so
maybeProduceValue i j | j < i = Just $! fromIntegral (i+j)
There are some more thunks in:
memoisedSection inputs' = flip evalState 0 $ V.generateM iMax $ \i -> do
validSection <- state $ \oldIx ->
let newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
in (V.unsafeTake newIx inputs', newIx)
return $ V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
Namely you want to:
let !newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
!v = V.unsafeTake newIx inputs'
in (v, newIx)
as the pair is lazy in its fields and
return $! V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
because return in the state monad is lazy in the value.
You can use a guard in a single list comprehension:
[f i j | j <- js, i <- is, j < i]
If you're trying to get the same results as
[foo i j | i <- is, j <- js, j < i]
when you know that js is increasing, just write
[foo i j | i <- is, j <- takeWhile (< i) js]
There's no need to mess around with Maybe for this. Note that making the input list global has a likely-unfortunate effect: instead of fusing the production of the input list with its transformation(s) and ultimate consumption, it's forced to actually construct the list and then keep it in memory. It's quite possible that it will take longer to pull the list into cache from memory than to generate it piece by piece on the fly!

Anything prevents optimizing tail-recursion?

I'm solving a knapsack problem in Haskell using Dynamic Programing. My first attempt was to build a two-dimensional table. But the memory easily gets blown up when the input is large(e.g. a 100 * 3190802 table).
Knowing that any given row i only depends on the row (i - 1), I instead write a function in the hope to take the advantage of tail recursion:
import Data.Vector (Vector, (!))
import qualified Data.Vector as V
-- n items, k capacity, vs values, ws weights
ans:: Int -> Int -> Vector Int -> Vector Int -> Int
ans n k vs ws =
let row = initRow k vs ws
in row ! k
initRow :: Int -> Vector Int -> Vector Int -> Vector Int
initRow k vs ws = itbl 1 $ V.replicate (k + 1) 0
where n = V.length vs
itbl i row
| i > n = row
| otherwise = itbl (i + 1) $ V.generate (k + 1) gen
where gen w =
let w_i = ws ! (i - 1)
no_i = row ! w
ok_i = row ! (w - w_i) + (vs ! (i - 1))
in
if w < w_i then no_i
else max no_i ok_i
As shown in the code, itbl calls itself recursively and no further computation is made on its return value. However, I still see memory grow relentlessly in top:
VIRT PID USER PR NI RES SHR S %CPU %MEM TIME+ COMMAND
1214m 9878 root 20 0 424m 1028 S 40.8 85.6 0:16.80 ghc
Is there anything in the code that prevents the compiler to produce optimized code for tail recursion?
code
data
--
This is a strictness problem. The call to generate in
| otherwise = itbl (i + 1) $ V.generate (k + 1) gen
does not actually force the vector into memory.
You can either import Control.DeepSeq and replace $ by deeply strict application $!!:
| otherwise = itbl (i + 1) $!! V.generate (k + 1) gen
or you can use an unboxed vector (which is probably faster) instead, by changing the import statements to
import Data.Vector.Unboxed (Vector, (!))
import qualified Data.Vector.Unboxed as V
(and leaving everything else as in your original program).

Using dynamic programming in Haskell? [Warning: ProjectEuler 31 solution inside]

In solving projecteuler.net's problem #31 [SPOILERS AHEAD] (counting the number of ways to make 2£ with the British coins), I wanted to use dynamic programming. I started with OCaml, and wrote the short and very efficient following programming:
open Num
let make_dyn_table amount coins =
let t = Array.make_matrix (Array.length coins) (amount+1) (Int 1) in
for i = 1 to (Array.length t) - 1 do
for j = 0 to amount do
if j < coins.(i) then
t.(i).(j) <- t.(i-1).(j)
else
t.(i).(j) <- t.(i-1).(j) +/ t.(i).(j - coins.(i))
done
done;
t
let _ =
let t = make_dyn_table 200 [|1;2;5;10;20;50;100;200|] in
let last_row = Array.length t - 1 in
let last_col = Array.length t.(last_row) - 1 in
Printf.printf "%s\n" (string_of_num (t.(last_row).(last_col)))
This executes in ~8ms on my laptop. If I increase the amount from 200 pence to one million, the program still finds an answer in less than two seconds.
I translated the program to Haskell (which was definitely not fun in itself), and though it terminates with the right answer for 200 pence, if I increase that number to 10000, my laptop comes to a screeching halt (lots of thrashing). Here's the code:
import Data.Array
createDynTable :: Int -> Array Int Int -> Array (Int, Int) Int
createDynTable amount coins =
let numCoins = (snd . bounds) coins
t = array ((0, 0), (numCoins, amount))
[((i, j), 1) | i <- [0 .. numCoins], j <- [0 .. amount]]
in t
populateDynTable :: Array (Int, Int) Int -> Array Int Int -> Array (Int, Int) Int
populateDynTable t coins =
go t 1 0
where go t i j
| i > maxX = t
| j > maxY = go t (i+1) 0
| j < coins ! i = go (t // [((i, j), t ! (i-1, j))]) i (j+1)
| otherwise = go (t // [((i, j), t!(i-1,j) + t!(i, j - coins!i))]) i (j+1)
((_, _), (maxX, maxY)) = bounds t
changeCombinations amount coins =
let coinsArray = listArray (0, length coins - 1) coins
dynTable = createDynTable amount coinsArray
dynTable' = populateDynTable dynTable coinsArray
((_, _), (i, j)) = bounds dynTable
in
dynTable' ! (i, j)
main =
print $ changeCombinations 200 [1,2,5,10,20,50,100,200]
I'd love to hear from somebody who knows Haskell well why the performance of this solution is so bad.
Haskell is pure. The purity means that values are immutable, and thus in the step
j < coins ! i = go (t // [((i, j), t ! (i-1, j))]) i (j+1)
you create an entire new array for each entry you update. That's already very expensive for a small amount like £2, but it becomes utterly obscene for an amount of £100.
Furthermore, the arrays are boxed, that means they contain pointers to the entries, which worsens locality, uses more storage, and allows thunks to be built up that are also slower to evaluate when they finally are forced.
The used algorithm depends on a mutable data structure for its efficiency, but the mutability is confined to the computation, so we can use what is intended to allow safely shielded computations with temporarily mutable data, the ST state transformer monad family, and the associated [unboxed, for efficiency] arrays.
Give me half an hour or so to translate the algorithm into code using STUArrays, and you'll get a Haskell version that is not too ugly, and ought to perform comparably to the O'Caml version (some more or less constant factor is expected for the difference, whether it's larger or smaller than 1, I don't know).
Here it is:
module Main (main) where
import System.Environment (getArgs)
import Data.Array.ST
import Control.Monad.ST
import Data.Array.Unboxed
standardCoins :: [Int]
standardCoins = [1,2,5,10,20,50,100,200]
changeCombinations :: Int -> [Int] -> Int
changeCombinations amount coins = runST $ do
let coinBound = length coins - 1
coinsArray :: UArray Int Int
coinsArray = listArray (0, coinBound) coins
table <- newArray((0,0),(coinBound, amount)) 1 :: ST s (STUArray s (Int,Int) Int)
let go i j
| i > coinBound = readArray table (coinBound,amount)
| j > amount = go (i+1) 0
| j < coinsArray ! i = do
v <- readArray table (i-1,j)
writeArray table (i,j) v
go i (j+1)
| otherwise = do
v <- readArray table (i-1,j)
w <- readArray table (i, j - coinsArray!i)
writeArray table (i,j) (v+w)
go i (j+1)
go 1 0
main :: IO ()
main = do
args <- getArgs
let amount = case args of
a:_ -> read a
_ -> 200
print $ changeCombinations amount standardCoins
runs in not too shabby time,
$ time ./mutArr
73682
real 0m0.002s
user 0m0.000s
sys 0m0.001s
$ time ./mutArr 1000000
986687212143813985
real 0m0.439s
user 0m0.128s
sys 0m0.310s
and uses checked array accesses, using unchecked accesses, the time could be somewhat reduced.
Ah, I just learned that your O'Caml code uses arbitrary precision integers, so using Int in Haskell puts O'Caml at an unfair disadvantage. The changes necessary to calculate the results with arbitrary precision Integers are minmal,
$ diff mutArr.hs mutArrIgr.hs
12c12
< changeCombinations :: Int -> [Int] -> Int
---
> changeCombinations :: Int -> [Int] -> Integer
17c17
< table <- newArray((0,0),(coinBound, amount)) 1 :: ST s (STUArray s (Int,Int) Int)
---
> table <- newArray((0,0),(coinBound, amount)) 1 :: ST s (STArray s (Int,Int) Integer)
28c28
< writeArray table (i,j) (v+w)
---
> writeArray table (i,j) $! (v+w)
only two type signatures needed to be adapted - the array necessarily becomes boxed, so we need to make sure we're not writing thunks to the array in line 28, and
$ time ./mutArrIgr
73682
real 0m0.002s
user 0m0.000s
sys 0m0.002s
$ time ./mutArrIgr 1000000
99341140660285639188927260001
real 0m1.314s
user 0m1.157s
sys 0m0.156s
the computation with the large result that overflowed for Ints takes noticeably longer, but as expected comparable to the O'Caml.
Spending some time understanding the O'Caml, I can offer a closer, a bit shorter, and arguably nicer translation:
module Main (main) where
import System.Environment (getArgs)
import Data.Array.ST
import Control.Monad.ST
import Data.Array.Unboxed
import Control.Monad (forM_)
standardCoins :: [Int]
standardCoins = [1,2,5,10,20,50,100,200]
changeCombinations :: Int -> [Int] -> Integer
changeCombinations amount coins = runST $ do
let coinBound = length coins - 1
coinsArray :: UArray Int Int
coinsArray = listArray (0, coinBound) coins
table <- newArray((0,0),(coinBound, amount)) 1 :: ST s (STArray s (Int,Int) Integer)
forM_ [1 .. coinBound] $ \i ->
forM_ [0 .. amount] $ \j ->
if j < coinsArray!i
then do
v <- readArray table (i-1,j)
writeArray table (i,j) v
else do
v <- readArray table (i-1,j)
w <- readArray table (i, j - coinsArray!i)
writeArray table (i,j) $! (v+w)
readArray table (coinBound,amount)
main :: IO ()
main = do
args <- getArgs
let amount = case args of
a:_ -> read a
_ -> 200
print $ changeCombinations amount standardCoins
that runs about equally fast:
$ time ./mutArrIgrM 1000000
99341140660285639188927260001
real 0m1.440s
user 0m1.273s
sys 0m0.164s
You could take advantage of Haskell being lazy and not schedule the array filling yourself, but instead relying on lazy evaluation to do it in the right order. (For large inputs you'll need to increase the stack size.)
import Data.Array
createDynTable :: Integer -> Array Int Integer -> Array (Int, Integer) Integer
createDynTable amount coins =
let numCoins = (snd . bounds) coins
t = array ((0, 0), (numCoins, amount))
[((i, j), go i j) | i <- [0 .. numCoins], j <- [0 .. amount]]
go i j | i == 0 = 1
| j < coins ! i = t ! (i-1, j)
| otherwise = t ! (i-1, j) + t ! (i, j - coins!i)
in t
changeCombinations amount coins =
let coinsArray = listArray (0, length coins - 1) coins
dynTable = createDynTable amount coinsArray
((_, _), (i, j)) = bounds dynTable
in
dynTable ! (i, j)
main =
print $ changeCombinations 200 [1,2,5,10,20,50,100,200]

Project Euler 14: performance compared to C and memoization

I'm currently working on project euler problem 14.
I solved it using a poorly coded program, without memoization, that took 386 5 seconds to run (see edit).
Here it is:
step :: (Integer, Int) -> Integer -> (Integer, Int)
step (i, m) n | nextValue > m = (n, nextValue)
| otherwise = (i, m)
where nextValue = syr n 1
syr :: Integer -> Int -> Int
syr 1 acc = acc
syr x acc | even x = syr (x `div` 2) (acc + 1)
| otherwise = syr (3 * x + 1) (acc + 1)
p14 = foldl step (0, 0) [500000..999999]
My question is about several comments in the thread related to this problem, where were mentionned execution times of <1 s for programs as follow (C code, credits to the project euler forum user ix for the code -- note: I didn't check that the execution time is in fact as mentionned):
#include <stdio.h>
int main(int argc, char **argv) {
int longest = 0;
int terms = 0;
int i;
unsigned long j;
for (i = 1; i <= 1000000; i++) {
j = i;
int this_terms = 1;
while (j != 1) {
this_terms++;
if (this_terms > terms) {
terms = this_terms;
longest = i;
}
if (j % 2 == 0) {
j = j / 2;
} else {
j = 3 * j + 1;
}
}
}
printf("longest: %d (%d)\n", longest, terms);
return 0;
}
To me, those programs are kind of the same, when talking about the algorithm.
So I wonder why there is such a big difference? Or is there any fondamental difference between our two algorithms that can justify a x6 factor in performance?
BTW, I'm currently trying to implement this algorithm with memoization, but am kind of lost as to me, it's way easier to implement in an imperative language (and I don't manipulate monads yet so I can't use this paradigm). So if you have any good tutorial that fits a beginner to learn memoization, I'll be glad (the ones I encountered were not detailed enough or out of my league).
Note: I came to declarative paradigm through Prolog and am still in the very early process of discovering Haskell, so I might miss important things.
Note2: any general advice about my code is welcome.
EDIT: thanks to delnan's help, I compiled the program and it now runs in 5 seconds, so I mainly look for hints on memoization now (even if ideas about the existing x6 gap are still welcome).
After having compiled it with optimisations, there are still several differences to the C programme
you use div, while the C programme uses machine division (which truncates) [but any self-respecting C compiler transforms that into a shift, so that makes it yet faster], that would be quot in Haskell; that reduced the run time by some 15% here.
the C programme uses fixed-width 64-bit (or even 32-bit, but then it's just luck that it gets the correct answer, since some intermediate values exceed 32-bit range) integers, the Haskell programme uses arbitrary precision Integers. If you have 64-bit Ints in your GHC (64-bit OS other than Windows), replace Integer with Int. That reduced the run time by a factor of about 3 here. If you're on a 32-bit system, you're out of luck, GHC doesn't use native 64-bit instructions there, these operations are implemented as C calls, that's still rather slow.
For the memoisation, you can outsource it to one of the memoisation packages on hackage, the only one that I remember is data-memocombinators, but there are others. Or you can do it yourself, for example keeping a map of previously calculated values - that would work best in the State monad,
import Control.Monad.State.Strict
import qualified Data.Map as Map
import Data.Map (Map, singleton)
type Memo = Map Integer Int
syr :: Integer -> State Memo Int
syr n = do
mb <- gets (Map.lookup n)
case mb of
Just l -> return l
Nothing -> do
let m = if even n then n `quot` 2 else 3*n+1
l <- syr m
let l' = l+1
modify (Map.insert n l')
return l'
solve :: Integer -> Int -> Integer -> State Memo (Integer,Int)
solve maxi len start
| len > 1000000 = return (maxi,len)
| otherwise = do
l <- syr start
if len < l
then solve start l (start+1)
else solve maxi len (start+1)
p14 :: (Integer,Int)
p14 = evalState (solve 0 0 500000) (singleton 1 1)
but that will probably not gain too much (not even when you've added the necessary strictness). The trouble is that a lookup in a Map is not too cheap and an insertion is relatively expensive.
Another method is to keep a mutable array for the lookup. The code becomes more complicated, since you have to choose a reasonable upper bound for the values to cache (should be not much larger than the bound for the starting values) and deal with the parts of the sequences falling outside the memoised range. But an array lookup and write are fast. If you have 64-bit Ints, the below code runs pretty fast, here it takes 0.03s for a limit of 1 million, and 0.33s for a limit of 10 million, the corresponding (as closely as I reasonably could) C code runs in 0.018 resp. 0.2s.
module Main (main) where
import System.Environment (getArgs)
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
import Data.Int
main :: IO ()
main = do
args <- getArgs
let bd = case args of
a:_ -> read a
_ -> 100000
print $ collMax bd
next :: Int -> Int
next n
| n .&. 1 == 0 = n `unsafeShiftR` 1
| otherwise = 3*n + 1
collMax :: Int -> (Int,Int16)
collMax upper = runST $ do
arr <- newArray (0,upper) 0 :: ST s (STUArray s Int Int16)
let go l m
| upper < m = go (l+1) $ next m
| otherwise = do
l' <- unsafeRead arr m
case l' of
0 -> do
l'' <- go 1 $ next m
unsafeWrite arr m (l'' + 1)
return (l+l'')
_ -> return (l+l'-1)
collect mi ml i
| upper < i = return (mi, ml)
| otherwise = do
l <- go 1 i
if l > ml
then collect i l (i+1)
else collect mi ml (i+1)
unsafeWrite arr 1 1
collect 1 1 2
Well, the C program uses unsigned long, but Integer can store arbitrarily large integers (it's a bignum). If you import Data.Word, then you can use Word, which is a machine-word-sized unsigned integer.
After replacing Integer with Word, and using ghc -O2 and gcc -O3, the C program runs in 0.72 seconds, while the Haskell programs runs in 1.92 seconds. 2.6x isn't bad. However, ghc -O2 doesn't always help, and this is one of the programs on which it doesn't! Using just -O, as you did, brings the runtime down to 1.90 seconds.
I tried replacing div with quot (which uses the same type of division as C; they only differ on negative inputs), but strangely it actually made the Haskell program run slightly slower for me.
You should be able to speed up the syr function with the help of this previous Stack Overflow question I answered about the same Project Euler problem.
On my current system (32-bit Core2Duo) your Haskell code, including all the optimizations given in the answers, takes 0.8s to compile and 1.2s to run.
You could transfer the run-time to compile-time, and reduce the run-time to nearly zero.
module Euler14 where
import Data.Word
import Language.Haskell.TH
terms :: Word -> Word
terms n = countTerms n 0
where
countTerms 1 acc = acc + 1
countTerms n acc | even n = countTerms (n `div` 2) (acc + 1)
| otherwise = countTerms (3 * n + 1) (acc + 1)
longestT :: Word -> Word -> (Word, Word)
longestT mi mx = find mi mx (0, 0)
where
find mi mx (ct,cn) | mi == mx = if ct > terms mi then (ct,cn) else (terms mi, mi)
| otherwise = find (mi + 1) mx
(if ct > terms mi then (ct,cn) else (terms mi, mi))
longest :: Word -> Word -> ExpQ
longest mi mx = return $ TupE [LitE (IntegerL (fromIntegral a)),
LitE (IntegerL (fromIntegral b))]
where
(a,b) = longestT mi mx
and
{-# LANGUAGE TemplateHaskell #-}
import Euler14
main = print $(longest 500000 999999)
On my system it takes 2.3s to compile this but the run-time goes down to 0.003s. Compile Time Function Execution (CTFE) is something you can't do in C/C++. The only other programming language that I know of that supports CTFE is the D programming language. And just to be complete, the C code takes 0.1s to compile and 0.7s to run.

Resources