Haskell solution for InterviewStreet String Similarity challenge

Haskell solution for InterviewStreet String Similarity challenge - string

This is my best attempt to solve the String Similarity challenge for InterviewStreet.
import Control.Monad
import Data.Text as T
import qualified Data.Text.IO as TIO
sumSimilarities s = (T.length s) + (sum $ Prelude.map (similarity s) (Prelude.tail $ tails s))
similarity :: Text -> Text -> Int
similarity a b = case commonPrefixes a b of
Just (x,_,_) -> T.length x
Nothing -> 0
main = do
cases <- fmap read getLine
inputs <- replicateM cases TIO.getLine
forM_ inputs $ print . sumSimilarities
It only passes 7/10 of the test cases. Test cases 7, 8, and 9 fail because they exceed the allotted execution time.
I'm half trying to verify that this is indeed possible to solve in Haskell and half looking for what an optimized Haskell program looks like.
Thanks!
Tyler

Like user5402, I'd be curious whether an equivalent (for certain values of equivalent) C programme would finish within the time limit or also time out. If it would, it would be interesting to see whether an equivalent programme using ByteStrings could finish in time. - Not that ByteStrings are per se faster than Text, but since the input must be converted to the internal representation of Text while ByteString takes it as is, that might make a difference. Another possible reason that ByteStrings might be faster - if the testing machines have 32-bit GHCs - would be that text's fusion at least used to need more registers than generally available on 32 bit architectures to get full profit [a long time ago, in the days of text-0.5 to text-0.7, on my 32-bit box, bytestring used to be quite a bit faster, no idea whether that still holds for newer text versions].
Okay, since user5402 has verified that the naïve algorithm is fast enough in C, I've gone ahead and wrote an implementation of the naïve algorithm using ByteStrings
{-# LANGUAGE BangPatterns #-}
module Main (main) where
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString.Unsafe as U
import Control.Monad
import Data.Word
main :: IO ()
main = do
cl <- C.getLine
case C.readInt cl of
Just (cases,_) -> replicateM_ cases (C.getLine >>= print . similarity)
Nothing -> return ()
-- Just to keep the condition readable.
(?) :: B.ByteString -> Int -> Word8
(?) = U.unsafeIndex
similarity :: B.ByteString -> Int
similarity bs
| len == 0 = 0
| otherwise = go len 1
where
!len = B.length bs
go !acc i
| i < len = go (acc + prf 0 i) (i+1)
| otherwise = acc
prf !k j
| j < len && bs ? k == bs ? j = prf (k+1) (j+1)
| otherwise = k
and compared it to the OP's Text version on some bad cases. On my box, that is more than four times faster than the Text version, so it'd be interesting whether that's fast enough (the C version is another 4.5 times faster, so it may well not be).
However, I consider it more likely that the time limit is exceeded due to using the naïve algorithm that has quadratic worst-case behaviour. Probably there are test cases that evoke the worst-case for the naïve algorithm.
So the solution would be to use an algorithm that scales better, optimally linear. One linear algorithm to compute the similarity of a string is the Z-algorithm.
The idea is simple (but, like most good ideas, not easy to have). Let us call a (non-empty) substring that is also a prefix of the string a prefix-substring. To avoid recomputation, the algorithm uses a window of the prefix-substring starting before the currently considered index that extends farthest to the right (initially, the window is empty).
Variables used and invariants of the algorithm:
i, the index under consideration, starts at 1 (for 0-based indexing; the entire string is not considered) and is incremented to length - 1
left and right, the first and last index of the prefix-substring window; invariants:
left < i, left <= right < length(S), either left > 0 or right < 1,
if left > 0, then S[left .. right] is the maximal common prefix of S and S[left .. ],
if 1 <= j < i and S[j .. k] is a prefix of S, then k <= right
An array Z, invariant: for 1 <= k < i, Z[k] contains the length of the longest common prefix of S[k .. ] and S.
The algorithm:
Set i = 1, left = right = 0 (any values with left <= right < 1 are allowed), and set Z[j] = 0 for all indices 1 <= j < length(S).
If i == length(S), stop.
If i > right, find the length l of the longest common prefix of S and S[i .. ], store it in Z[i]. If l > 0 we have found a window extending farther right than the previous, then set left = i and right = i+l-1, otherwise leave them unchanged. Increment i and go to 2.
Here left < i <= right, so the substring S[i .. right] is known - since S[left .. right] is a prefix of S, it is equal to S[i-left .. right-left].
Now consider the longest common prefix of S with the substring starting at index i - left.
Its length is Z[i-left], hence S[k] = S[i-left + k] for 0 <= k < Z[i-left] and
S[Z[i-left]] ≠ S[i-left+Z[i-left]]. Now, if Z[i-left] <= right-i, then i + Z[i-left] is inside the known window, therefore
S[i + Z[i-left]] = S[i-left + Z[i-left]] ≠ S[Z[i-left]]
S[i + k] = S[i-left + k] = S[k] for 0 <= k < Z[i-left]
and we see that the length of the longest common prefix of S and S[i .. ] has length Z[i-left].
Then set Z[i] = Z[i-left], increment i, and go to 2.
Otherwise, S[i .. right] is a prefix of S and we check how far it extends, starting the comparison of characters at the indices right+1 and right+1 - i. Let the length be l. Set Z[i] = l, left = i, right = i + l - 1, increment i, and go to 2.
Since the window never moves left, and the comparisons always start after the end of the window, each character in the string is compared at most once successfully to an earlier character in the string, and for each starting index, there is at most one unsuccessful comparison, therefore the algorithm is linear.
The code (using ByteString out of habit, ought to be trivially portable to Text):
{-# LANGUAGE BangPatterns #-}
module Main (main) where
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString.Unsafe as U
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Control.Monad
import Data.Word
main :: IO ()
main = do
cl <- C.getLine
case C.readInt cl of
Just (cases,_) -> replicateM_ cases (C.getLine >>= print . similarity)
Nothing -> return ()
-- Just to keep the condition readable.
(?) :: B.ByteString -> Int -> Word8
(?) = U.unsafeIndex
-- Calculate the similarity of a string using the Z-algorithm
similarity :: B.ByteString -> Int
similarity bs
| len == 0 = 0
| otherwise = runST getSim
where
!len = B.length bs
getSim = do
za <- newArray (0,len-1) 0 :: ST s (STUArray s Int Int)
-- The common prefix of the string with itself is entire string.
unsafeWrite za 0 len
let -- Find the length of the common prefix.
go !k j
| j < len && (bs ? j == bs ? k) = go (k+1) (j+1)
| otherwise = return k
-- The window with indices in [left .. right] is the prefix-substring
-- starting before i that extends farthest.
loop !left !right i
| i >= len = count 0 0 -- when done, sum
| i > right = do
-- We're outside the window, simply
-- find the length of the common prefix
-- and store it in the Z-array.
w <- go 0 i
unsafeWrite za i w
if w > 0
-- We got a non-empty common prefix and a new window.
then loop i (i+w-1) (i+1)
-- No new window, same procedure at next index.
else loop left right (i+1)
| otherwise = do
-- We're inside the window, so the substring starting at
-- (i - left) has a common prefix with the substring
-- starting at i of length at least (right - i + 1)
-- (since the [left .. right] window is a prefix of bs).
-- But we already know how long the common prefix
-- starting at (i - left) is.
z <- unsafeRead za (i-left)
let !s = right-i+1 -- length of known prefix starting at i
if z < s
-- If the common prefix of the substring starting at
-- (i - left) is shorter than the rest of the window,
-- the common prefix of the substring starting at i
-- is the same. Store it and move on with the same window.
then do
unsafeWrite za i z
loop left right (i+1)
else do
-- Otherwise, find out how far the common prefix
-- extends, starting at (right + 1) == s + i.
w <- go s (s+i)
unsafeWrite za i w
loop i (i+w-1) (i+1)
count !acc i
| i == len = return acc
| otherwise = do
n <- unsafeRead za i
count (acc+n) (i+1)
loop 0 0 1

Related

Haskell ways to the 3n+1 challenge

Here is a simple programming problem from SPOJ: http://www.spoj.com/problems/PROBTRES/.
Basically, you are asked to output the biggest Collatz cycle for numbers between i and j. (Collatz cycle of a number $n$ is the number of steps to eventually get from $n$ to 1.)
I have been looking for a Haskell way to solve the problem with comparative performance than that of Java or C++ (so as to fits in the allowed run-time limit). Although a simple Java solution that memoizes the cycle length of any already computed cycles will work. I haven't been successful at applying the idea to obtain a Haskell solution.
I have tried the Data.Function.Memoize, as well as home-brewed log time memoization technique using the idea from this post: Memoization in Haskell?. Unfortunately, memoization actually makes the computation of cycle(n) even slower. I believe the slow down comes from the overhead of haskell way. (I tried running with the compiled binary code, instead of interpreting.)
I also suspect that simply iterating numbers from i to j can be costly ($i,j\le10^6$). So I even tried precompute everything for the range query, using idea from http://blog.openendings.net/2013/10/range-trees-and-profiling-in-haskell.html. However, this still gives "Time Limit Exceeding" error.
Can you help to inform a neat competitive Haskell program for this?
Thanks!

>>> using the approach bellow, I could submit an accepted answer to SPOJ. You may check the entire code from here.
The problem has bounds 0 < n < 1,000,000. Pre-calculate all of them and store them inside an array; then freeze the array. The array can be used as its own cache / memoization space.
The problem would then reduce to a range query problem over an array, which can be done very efficiently using trees.
With the code bellow I can get Collatz of 1..1,000,000 in a fraction of a second:
$ time echo 1000000 | ./collatz
525
real 0m0.177s
user 0m0.173s
sys 0m0.003s
Note that collatz function below, uses mutable STUArray internally, but itself is a pure function:
import Control.Monad.ST (ST)
import Control.Monad (mapM_)
import Control.Applicative ((<$>))
import Data.Array.Unboxed (UArray, elems)
import Data.Array.ST (STUArray, readArray, writeArray, runSTUArray, newArray)
collatz :: Int -> UArray Int Int
collatz size = out
where
next i = if odd i then 3 * i + 1 else i `div` 2
loop :: STUArray s Int Int -> Int -> ST s Int
loop arr k
| size < k = succ <$> loop arr (next k)
| otherwise = do
out <- readArray arr k
if out /= 0 then return out
else do
out <- succ <$> loop arr (next k)
writeArray arr k out
return out
out = runSTUArray $ do
arr <- newArray (1, size) 0
writeArray arr 1 1
mapM_ (loop arr) [2..size]
return arr
main = do
size <- read <$> getLine
print . maximum . elems $ collatz size
In order to perform range queries on this array, you may build a balanced tree as simple as below:
type Range = (Int, Int)
data Tree = Leaf Int | Node Tree Tree Range Int
build_tree :: Int -> Tree
build_tree size = loop 1 cnt
where
ctz = collatz size
cnt = head . dropWhile (< size) $ iterate (*2) 1
(Leaf a) +: (Leaf b) = max a b
(Node _ _ _ a) +: (Node _ _ _ b) = max a b
loop lo hi
| lo == hi = Leaf $ if size < lo then minBound else ctz ! lo
| otherwise = Node left right (lo, hi) (left +: right)
where
i = (lo + hi) `div` 2
left = loop lo i
right = loop (i + 1) hi
query_tree :: Tree -> Int -> Int -> Int
query_tree (Leaf x) _ _ = x
query_tree (Node l r (lo, hi) x) i j
| i <= lo && hi <= j = x
| mid < i = query_tree r i j
| j < 1 + mid = query_tree l i j
| otherwise = max (query_tree l i j) (query_tree r i j)
where mid = (lo + hi) `div` 2

Here is the same as in the other answer, but with an immutable recursively defined array (and it also leaks slightly (can someone say why?) and so two times slower):
import Data.Array
upper = 10^6
step :: Integer -> Int
step i = 1 + colAt (if odd i then 3 * i + 1 else i `div` 2)
colAt :: Integer -> Int
colAt i | i > upper = step i
colAt i = col!i
col :: Array Integer Int
col = array (1, upper) $ (1, 1) : [(i, step i) | i <- [2..upper]]
main = print $ maximum $ elems col

Short-circuiting a function over a lower triangular(ish) array in Haskell: speed leads to ugly code

I've got a function, in my minimum example called maybeProduceValue i j, which is only valid when i > j. Note that in my actual code, the js are not uniform and so the data only resembles a triangular matrix, I don't know what the mathematical name for this is.
I'd like my code, which loops over i and j and returns essentially (where js is sorted)
[maximum [f i j | j <- js, j < i] | i <- [0..iMax]]
to not check any more j's once one has failed. In C-like languages, this is simple as
if (j >= i) {break;}
and I'm trying to recreate this behaviour in Haskell. I've got two implementations below:
one which tries to take advantage of laziness by using takeWhile to only inspect at most one value (per i) which fails the test and returns Nothing;
one which remembers the number of js which worked for the previous i and so, for i+1, it doesn't bother doing any safety checks until it exceeds this number.
This latter function is more than twice as fast by my benchmarks but it really is a mess - I'm trying to convince people that Haskell is more concise and safe while still reasonably performant and here is some fast code which is dense, cluttered and does a bunch of unsafe operations.
Is there a solution, perhaps using Cont, Error or Exception, that can achieve my desired behaviour?
n.b. I've tried using Traversable.mapAccumL and Vector.unfoldrN instead of State and they end up being about the same speed and clarity. It's still a very overcomplicated way of solving this problem.
import Criterion.Config
import Criterion.Main
import Control.DeepSeq
import Control.Monad.State
import Data.Maybe
import qualified Data.Traversable as T
import qualified Data.Vector as V
main = deepseq inputs $ defaultMainWith (defaultConfig{cfgSamples = ljust 10}) (return ()) [
bcompare [
bench "whileJust" $ nf whileJust js,
bench "memoised" $ nf memoisedSection js
]]
iMax = 5000
jMax = 10000
-- any sorted vector
js :: V.Vector Int
js = V.enumFromN 0 jMax
maybeProduceValue :: Int -> Int -> Maybe Float
maybeProduceValue i j | j < i = Just (fromIntegral (i+j))
| otherwise = Nothing
unsafeProduceValue :: Int -> Int -> Float
-- unsafeProduceValue i j | j >= i = error "you fool!"
unsafeProduceValue i j = fromIntegral (i+j)
whileJust, memoisedSection
:: V.Vector Int -> V.Vector Float
-- mean: 389ms
-- short circuits properly
whileJust inputs' = V.generate iMax $ \i ->
safeMax . V.map fromJust . V.takeWhile isJust $ V.map (maybeProduceValue i) inputs'
where safeMax v = if V.null v then 0 else V.maximum v
-- mean: 116ms
-- remembers the (monotonically increasing) length of the section of
-- the vector that is safe. I have tested that this doesn't violate the condition that j < i
memoisedSection inputs' = flip evalState 0 $ V.generateM iMax $ \i -> do
validSection <- state $ \oldIx ->
let newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
in (V.unsafeTake newIx inputs', newIx)
return $ V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection

Here's a simple way of solving the problem with Applicatives, provided that you don't need to keep the rest of the list once you run into an issue:
import Control.Applicative
memoizeSections :: Ord t => [(t, t)] -> Maybe [t]
memoizeSections [] = Just []
memoizeSections ((x, y):xs) = (:) <$> maybeProduceValue x y <*> memoizeSections xs
This is equivalent to:
import Data.Traversable
memoizeSections :: Ord t => [(t, t)] -> Maybe [t]
memoizeSections = flip traverse (uncurry maybeProduceValue)
and will return Nothing on the first occurrence of failure. Note that I don't know how fast this is, but it's certainly concise, and arguably pretty clear (particularly the first example).

Some minor comments:
-- any sorted vector
js :: V.Vector Int
js = V.enumFromN 0 jMax
If you have a vector of Ints (or Floats, etc), you want to use Data.Vector.Unboxed.
maybeProduceValue :: Int -> Int -> Maybe Float
maybeProduceValue i j | j < i = Just (fromIntegral (i+j))
| otherwise = Nothing
Since Just is lazy in its only field, this will create a thunk for the computation fromIntegral (i+j). You almost always want to apply Just like so
maybeProduceValue i j | j < i = Just $! fromIntegral (i+j)
There are some more thunks in:
memoisedSection inputs' = flip evalState 0 $ V.generateM iMax $ \i -> do
validSection <- state $ \oldIx ->
let newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
in (V.unsafeTake newIx inputs', newIx)
return $ V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
Namely you want to:
let !newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
!v = V.unsafeTake newIx inputs'
in (v, newIx)
as the pair is lazy in its fields and
return $! V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
because return in the state monad is lazy in the value.

You can use a guard in a single list comprehension:
[f i j | j <- js, i <- is, j < i]

If you're trying to get the same results as
[foo i j | i <- is, j <- js, j < i]
when you know that js is increasing, just write
[foo i j | i <- is, j <- takeWhile (< i) js]
There's no need to mess around with Maybe for this. Note that making the input list global has a likely-unfortunate effect: instead of fusing the production of the input list with its transformation(s) and ultimate consumption, it's forced to actually construct the list and then keep it in memory. It's quite possible that it will take longer to pull the list into cache from memory than to generate it piece by piece on the fly!

Why I take a message for stack overflow in Haskell?

import Data.Vector hiding((++))
import System.Environment
d = generate 1000000 (\z->case z of
0 -> 2
1 -> 3
2 -> 5
otherwise -> if odd z then (d ! (z-1)) +2 else (d ! (z-1)) + 4)
algorithmA _ _ 1 pt = pt
algorithmA t k n pt = let dk = d ! k
q = div n dk
r = mod n dk
in if r /=0 then
if q>dk then
algorithmA t (k+1) n pt
else (n:pt)
else
algorithmA (t+1) k q (dk:pt)
main = do
args<-getArgs
let n = read (args !! 0)
if (floor(sqrt(fromInteger n))) > Data.Vector.last d then error ("The square root of number is greater than " ++ show (Data.Vector.last d))
else
print (algorithmA 0 0 n [])
When I compile the above program and give for example in the command line test1 2222 I take the message "Stake space overflow: current size ... use +RTS -Ksize -RTS to increase ... ". But when I delete the if in the main function then the program works without problem. Also if I give the command Data.Vector.last d in the ghci the value is calculated without problem. So why this message is printed? When I increase the stack size to 20M the program plays without problem.
The test1 is the name of executable.
Thanks.

The problem is that your code is being too lazy when constructing d. Remember that Data.Vector.Vector is a boxed vector type - that is, it is represented internally as an array of pointers to heap objects (which are either values or unevaluated thunks). So when you're populating d with generate, you are actually creating a vector of thunks. In your example, when the thunk at position n is accessed, it triggers the evaluation of thunks at positions n-1 and n-2, which in turn triggers evaluation of thunks n-3, n-4, n-5 and so on. So evaluating the last element causes the previous 1000000 - 1 elements to be evaluated, causing the stack to grow. This is why you get the stack overflow error.
An easy way to fix this without modifying your code is to fully evaluate the whole vector before accessing the last element. In that case all thunks are evaluated in order and there is no stack overflow (since once a thunk has been evaluated, it's replaced with the value of the expression it represented, so when you're evaluating element n after having already evaluated elements n-1 and n-2, only those two elements have to be accessed and the cascading evaluation of all previous thunks is not triggered):
import Control.DeepSeq (($!!))
...
let l = V.last $!! d
...
Testing:
$ ghc -O2 Test.hs
[1 of 1] Compiling Main ( Test.hs, Test.o )
Linking Test ...
$ ./Test 2222
[101,11,2]
Alternatively, you can use unboxed vectors (flat arrays of Ints):
d :: U.Vector Int
d = U.create $ do
v <- M.new dSize
go 0 v
where
dSize = 1000000
go i v | i >= dSize = return v
| otherwise = do
val <- case i of
0 -> return 2
1 -> return 3
2 -> return 5
_ -> if odd i
then (+2) <$> (M.read v (i-1))
else (+4) <$> (M.read v (i-1))
M.write v i val
go (i+1) v

Is it possible to use Haskell to reasonably solve large DP problems

I wrote code for solving the local alignment problem with Smith–Waterman algorithm.
I want to do this with input of strings with length 10000, with reasonable memory(under 2GB ram) and reasonable time (under 5 minutes).
At first I was using bio library's built in function for this, and it runs way too slow and eat up 4GB of ram before I killed it.
Note the java program jAligner, which implements the same algorithm, can solve this problem with less than 1GB of memory and less than 20 seconds.
When I wrote an unboxed version of this, the program gives me <<loop>>. I think it's because the array need to access items in the array before the array gets built entirely.
So I wonder is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.
module LocalAlign where
--import Data.Array.Unboxed
import Data.Tuple
import Data.Array
localAffineAlignment :: (Char -> Char -> Int)
-> Int
-> Int
-> String
-> String
-> (Int, (String, String, String, String))
localAffineAlignment f g e s' t' = (score, best) where
n = length s'
m = length t'
s= array (0,n-1) $ zip [0..n-1] s'
t= array (0,m-1) $ zip [0..m-1] t'
table :: (Array (Int,Int) Int,Array (Int,Int) Int)
table = (c,d)
where --a :: UArray (Int,Int) Int
a = array ((0,0),(n,m)) [((x,y),a' x y)|x<-[0..n],y<-[0..m]] --s end with gap
b = array ((0,0),(n,m)) [((x,y),b' x y)|x<-[0..n],y<-[0..m]] --t end with gap
c = array ((0,0),(n,m)) [((x,y),fst (c' x y))|x<-[0..n],y<-[0..m]] -- best
d = array ((0,0),(n,m)) [((x,y),snd (c' x y))|x<-[0..n],y<-[0..m]] -- direction
a' i j
| i==0 || j==0 = inf
| otherwise = max (a!(i-1,j)-e) (c!(i-1,j)-g-e)
b' i j
| i==0 || j==0 = inf
| otherwise = max (b!(i,j-1)-e) (c!(i,j-1)-g-e)
c' i j
| min i j == 0 = (0,0)
| otherwise = maximum [(b!(i,j),3),(a!(i,j),2),(c!(i-1,j-1) + f u v,1),(0,0)]
where u = s!(i-1)
v = t!(j-1)
inf = -1073741824
score :: Int
score = maximum $ elems $ fst table
best :: (String, String, String, String)
best = (drop si $ take ei s',drop sj $ take ej t',b1,b2)
where (a,d') = table
(si,sj,b1,b2) = build ei ej [] []
(ei,ej) = snd $ maximum $ map swap $ assocs a
build x y ss tt
| o == 0 = (x,y,ss,tt)
| d == 1 = build (x-1) (y-1) (u:ss) (v:tt)
| d == 2 = build (x-1) y (u:ss) ('-':tt)
| otherwise = build x (y-1) ('-':ss) (v:tt)
where o = a!(x,y)
d = d'!(x,y)
u = s!(x-1)
v = t!(y-1)

is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.
Yes, of course. Use the same data structures and the same algorithms, and you will get same (or better, or worse, by constant factors) performance.
You are using (intermediate) lists and boxed arrays heavily. Consider using the vector package instead.

You might be interested in the MemoCombinators library, which makes doing dynamic programming much easier. You can basically write the algorithm without memoization, then just annotate which variables you want memoized, and the compiler takes it from there.

Project Euler 14: performance compared to C and memoization

I'm currently working on project euler problem 14.
I solved it using a poorly coded program, without memoization, that took 386 5 seconds to run (see edit).
Here it is:
step :: (Integer, Int) -> Integer -> (Integer, Int)
step (i, m) n | nextValue > m = (n, nextValue)
| otherwise = (i, m)
where nextValue = syr n 1
syr :: Integer -> Int -> Int
syr 1 acc = acc
syr x acc | even x = syr (x `div` 2) (acc + 1)
| otherwise = syr (3 * x + 1) (acc + 1)
p14 = foldl step (0, 0) [500000..999999]
My question is about several comments in the thread related to this problem, where were mentionned execution times of <1 s for programs as follow (C code, credits to the project euler forum user ix for the code -- note: I didn't check that the execution time is in fact as mentionned):
#include <stdio.h>
int main(int argc, char **argv) {
int longest = 0;
int terms = 0;
int i;
unsigned long j;
for (i = 1; i <= 1000000; i++) {
j = i;
int this_terms = 1;
while (j != 1) {
this_terms++;
if (this_terms > terms) {
terms = this_terms;
longest = i;
}
if (j % 2 == 0) {
j = j / 2;
} else {
j = 3 * j + 1;
}
}
}
printf("longest: %d (%d)\n", longest, terms);
return 0;
}
To me, those programs are kind of the same, when talking about the algorithm.
So I wonder why there is such a big difference? Or is there any fondamental difference between our two algorithms that can justify a x6 factor in performance?
BTW, I'm currently trying to implement this algorithm with memoization, but am kind of lost as to me, it's way easier to implement in an imperative language (and I don't manipulate monads yet so I can't use this paradigm). So if you have any good tutorial that fits a beginner to learn memoization, I'll be glad (the ones I encountered were not detailed enough or out of my league).
Note: I came to declarative paradigm through Prolog and am still in the very early process of discovering Haskell, so I might miss important things.
Note2: any general advice about my code is welcome.
EDIT: thanks to delnan's help, I compiled the program and it now runs in 5 seconds, so I mainly look for hints on memoization now (even if ideas about the existing x6 gap are still welcome).

After having compiled it with optimisations, there are still several differences to the C programme
you use div, while the C programme uses machine division (which truncates) [but any self-respecting C compiler transforms that into a shift, so that makes it yet faster], that would be quot in Haskell; that reduced the run time by some 15% here.
the C programme uses fixed-width 64-bit (or even 32-bit, but then it's just luck that it gets the correct answer, since some intermediate values exceed 32-bit range) integers, the Haskell programme uses arbitrary precision Integers. If you have 64-bit Ints in your GHC (64-bit OS other than Windows), replace Integer with Int. That reduced the run time by a factor of about 3 here. If you're on a 32-bit system, you're out of luck, GHC doesn't use native 64-bit instructions there, these operations are implemented as C calls, that's still rather slow.
For the memoisation, you can outsource it to one of the memoisation packages on hackage, the only one that I remember is data-memocombinators, but there are others. Or you can do it yourself, for example keeping a map of previously calculated values - that would work best in the State monad,
import Control.Monad.State.Strict
import qualified Data.Map as Map
import Data.Map (Map, singleton)
type Memo = Map Integer Int
syr :: Integer -> State Memo Int
syr n = do
mb <- gets (Map.lookup n)
case mb of
Just l -> return l
Nothing -> do
let m = if even n then n `quot` 2 else 3*n+1
l <- syr m
let l' = l+1
modify (Map.insert n l')
return l'
solve :: Integer -> Int -> Integer -> State Memo (Integer,Int)
solve maxi len start
| len > 1000000 = return (maxi,len)
| otherwise = do
l <- syr start
if len < l
then solve start l (start+1)
else solve maxi len (start+1)
p14 :: (Integer,Int)
p14 = evalState (solve 0 0 500000) (singleton 1 1)
but that will probably not gain too much (not even when you've added the necessary strictness). The trouble is that a lookup in a Map is not too cheap and an insertion is relatively expensive.
Another method is to keep a mutable array for the lookup. The code becomes more complicated, since you have to choose a reasonable upper bound for the values to cache (should be not much larger than the bound for the starting values) and deal with the parts of the sequences falling outside the memoised range. But an array lookup and write are fast. If you have 64-bit Ints, the below code runs pretty fast, here it takes 0.03s for a limit of 1 million, and 0.33s for a limit of 10 million, the corresponding (as closely as I reasonably could) C code runs in 0.018 resp. 0.2s.
module Main (main) where
import System.Environment (getArgs)
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
import Data.Int
main :: IO ()
main = do
args <- getArgs
let bd = case args of
a:_ -> read a
_ -> 100000
print $ collMax bd
next :: Int -> Int
next n
| n .&. 1 == 0 = n `unsafeShiftR` 1
| otherwise = 3*n + 1
collMax :: Int -> (Int,Int16)
collMax upper = runST $ do
arr <- newArray (0,upper) 0 :: ST s (STUArray s Int Int16)
let go l m
| upper < m = go (l+1) $ next m
| otherwise = do
l' <- unsafeRead arr m
case l' of
0 -> do
l'' <- go 1 $ next m
unsafeWrite arr m (l'' + 1)
return (l+l'')
_ -> return (l+l'-1)
collect mi ml i
| upper < i = return (mi, ml)
| otherwise = do
l <- go 1 i
if l > ml
then collect i l (i+1)
else collect mi ml (i+1)
unsafeWrite arr 1 1
collect 1 1 2

Well, the C program uses unsigned long, but Integer can store arbitrarily large integers (it's a bignum). If you import Data.Word, then you can use Word, which is a machine-word-sized unsigned integer.
After replacing Integer with Word, and using ghc -O2 and gcc -O3, the C program runs in 0.72 seconds, while the Haskell programs runs in 1.92 seconds. 2.6x isn't bad. However, ghc -O2 doesn't always help, and this is one of the programs on which it doesn't! Using just -O, as you did, brings the runtime down to 1.90 seconds.
I tried replacing div with quot (which uses the same type of division as C; they only differ on negative inputs), but strangely it actually made the Haskell program run slightly slower for me.
You should be able to speed up the syr function with the help of this previous Stack Overflow question I answered about the same Project Euler problem.

On my current system (32-bit Core2Duo) your Haskell code, including all the optimizations given in the answers, takes 0.8s to compile and 1.2s to run.
You could transfer the run-time to compile-time, and reduce the run-time to nearly zero.
module Euler14 where
import Data.Word
import Language.Haskell.TH
terms :: Word -> Word
terms n = countTerms n 0
where
countTerms 1 acc = acc + 1
countTerms n acc | even n = countTerms (n `div` 2) (acc + 1)
| otherwise = countTerms (3 * n + 1) (acc + 1)
longestT :: Word -> Word -> (Word, Word)
longestT mi mx = find mi mx (0, 0)
where
find mi mx (ct,cn) | mi == mx = if ct > terms mi then (ct,cn) else (terms mi, mi)
| otherwise = find (mi + 1) mx
(if ct > terms mi then (ct,cn) else (terms mi, mi))
longest :: Word -> Word -> ExpQ
longest mi mx = return $ TupE [LitE (IntegerL (fromIntegral a)),
LitE (IntegerL (fromIntegral b))]
where
(a,b) = longestT mi mx
and
{-# LANGUAGE TemplateHaskell #-}
import Euler14
main = print $(longest 500000 999999)
On my system it takes 2.3s to compile this but the run-time goes down to 0.003s. Compile Time Function Execution (CTFE) is something you can't do in C/C++. The only other programming language that I know of that supports CTFE is the D programming language. And just to be complete, the C code takes 0.1s to compile and 0.7s to run.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Haskell solution for InterviewStreet String Similarity challenge - string

Related

Haskell ways to the 3n+1 challenge

Short-circuiting a function over a lower triangular(ish) array in Haskell: speed leads to ugly code

Why I take a message for stack overflow in Haskell?

Is it possible to use Haskell to reasonably solve large DP problems

Project Euler 14: performance compared to C and memoization

Categories

Resources