Why the recursive list is so slow in haskell? - haskell

I am fresh in haskell, and I defined a func in Haskell :
febs :: (Integral a)=> a -> a
febs n
| n<=0 =0
| n==1 =1
| n==2 =1
| otherwise =febs(n-1)+febs(n-2)
but, it runs so slow, and when I do "febs 30", it will take about 10s,
and I do the same func in C++, it runs very fast.
int febs(int n)
{
if(n == 1 || n ==2)
{
return 1;
}
return febs(n-1)+febs(n-2);
}
Is there any way to promote my haskell func speed?

This is an odd comparison, for the following reasons:
You don't say whether you're compiling the Haskell code, or with what options. If you're just running it in ghci, then of course it will be slow - you're comparing interpreted code with compiled code.
Your Haskell code is polymorphic whereas your C++ code is monomorphic (that is, you've used a type class Integral a => a -> a instead of the concrete type Int -> Int). Your Haskell code is therefore more general than your C++ code, because it can handle arbitrarily large integers instead of being restricted to the range of an Int. It's possible that the compiler will optimize this away, but I'm not certain.
If I put the following code in a file fib.hs
fibs :: Int -> Int
fibs n = if n < 3 then 1 else fibs (n-1) + fibs (n-2)
main = print (fibs 30)
and compile it with ghc -O2 fib.hs then it runs fast enough that it appears instantaneous to me. You should try that, and see how it compares with the C++ code.

Try compiling with optimization. With GHC 7.4.1 with -O2, your program runs quite quickly:
$ time ./test
832040
real 0m0.057s
user 0m0.056s
sys 0m0.000s
This is with main = print (febs 30).
Regarding the polymorphism considerations in Chris Taylor's answer, here's febs 40 with OP's polymorphic Fibonacci function:
$ time ./test
102334155
real 0m5.670s
user 0m5.652s
sys 0m0.004s
And here is a non-polymorphic one, i.e. with OP's signature replaced with Int -> Int:
$ time ./test
102334155
real 0m0.820s
user 0m0.816s
sys 0m0.000s
Per Tikhon Jelvis' comment, it'd be interesting to see if the speedup is due to replacing Integer with Int, or due to getting rid of polymorphism. Here's the same program again, except with febs moved to a new file per Daniel Fischer's comment, and with with febs :: Integer -> Integer:
$ time ./test
102334155
real 0m5.648s
user 0m5.624s
sys 0m0.008s
Again, with febs in a different file, and with the same polymorphic signature as originally:
$ time ./test
102334155
real 0m16.610s
user 0m16.469s
sys 0m0.104s

You could also write the function like this:
fibs = 0:1:zipWith (+) fibs (tail fibs)
It is very fast, even for big 'n' executes immediately:
Prelude> take 1000 fibs

Related

Using timeout with non-IO function haskell

I have function fun1 that is not IO and can be computationally expensive, so I want to run it for a specified amount of seconds max. I found a function timeout, but it requires this fun1 to be of IO.
timeout :: Int -> IO a -> IO (Maybe a)
How can I circumvent this, or is there a better approach to achieve my goal?
Edit:
I revised first sentence fun1 is NOT IO, it is of type fun1 :: Formula -> Bool.
Close to what talex said except moving the seq should work. Here is an example using inefficient fib as the expensive computation.
Prelude> import System.Timeout
Prelude System.Timeout> :{
Prelude System.Timeout| let fib 0 = 0
Prelude System.Timeout| fib 1 = 1
Prelude System.Timeout| fib n = fib (n-1) + fib (n-2)
Prelude System.Timeout| :}
Prelude System.Timeout> timeout 1000000 (let x = fib 44 in x `seq` return x)
Nothing
Prelude System.Timeout>
Limiting function execution to a specific time length is not pure (i.e. it does not ensure the same result every time), hence you should not be pursuing such behavior outside of IO. You can, for example, use something evil like unsafePerformIO (timeout 1000 (pure fun1)) but such usage will quickly lead to programs that are hard to understand with unexpected quirks. A better idea may be to define a custom monad that allows limited time execution and can be lifted to IO but I don't know if such a thing exists.
import System.Timeout (timeout)
import Control.Exception (evaluate)
import Control.DeepSeq (NFData, force)
timeoutPure :: Int -> a -> IO (Maybe a)
timeoutPure t = timeout t . evaluate
timeoutPureDeep :: NFData a => Int -> a -> IO (Maybe a)
timeoutPureDeep t = timeoutPure t . force
You may not want to actually write these functions, but they demonstrate the right approach. evaluate is better than seq for this sort of thing, because seq can potentially be moved around by the compiler, escaping the timeout. I'm not sure if that's actually possible in this case, but it's better to just do the thing that's sure to work than to try to analyze carefully whether the riskier approach is okay.

Heap is full of PINNED

I have a small program that have reasonable maximum residency but allocates linearly. At first, I thought that should be cons cells or I#, but running the program with -p -hc shows heap overwhelmed by PINNED. Does anyone understand the reason and/or can suggest an improvement?
The program
-- task27.hs
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad
import Control.Monad.ST
import Control.Exception
import System.Random
import Data.Functor
import qualified Data.Vector.Generic.Mutable as V
import qualified Data.Vector.Unboxed as U
m = 120
task27 :: [Int] -> (Int, Int)
task27 l = runST $ do
r <- V.replicate m 0 :: ST s (U.MVector s Int)
let go [] = return (1,2)
go (a:as) = do
let p = a `mod` m
cur_lead <- r `V.read` p
when (a > cur_lead) (V.write r p a)
go as
go l
randTest ::
Int -> -- Length of random testing sequence
IO ()
randTest n =
newStdGen <&>
randoms <&>
take n <&>
task27 >>=
print
main = randTest 1000000
My package.yaml:
name: task27
dependencies:
- base == 4.*
- vector
- random
executables:
task27:
main: task27.hs
ghc-options: -O2
My cabal.project.local:
profiling: True
I do cabal -v0 run task27 -- +RTS -p -hc && hp2ps -e8in -c task27.hp and get this:
I tried to add bangs here and there but that did not seem to help.
As #WillemVanOnsem says, in GHC terms, 35kB resident is miniscule. Whatever performance issue you have, it's got nothing to do with this tiny bit of pinned memory. Originally, I said that this was probably the Vectors, but that's wrong. Data.Text uses pinned memory, but Data.Vector doesn't. This bit of PINNED memory looks like it's actually from the runtime system itself, so you can ignore it (see below).
In GHC code, "total allocation" is a measure of processing. A GHC program is an allocation engine. If it's not allocating, it's probably not doing anything (with rare exceptions). So, if you expect your algorithm to run in O(n) time, then it will also be O(n) in total allocation, usually gigabytes worth.
With respect to the "rare exceptions", a GHC program can run in constant "total allocation" but non-constant time if aggressive optimization allows computations using fully unboxed values. So, for example:
main = print (sum [1..10000000] :: Int)
runs in constant total allocation (e.g., 50kB allocated on the heap), because the Ints can be unboxed. For comparison,
main = print (sum [1..10000000] :: Integer)
runs with O(n) total allocation (e.g., 320MB allocated on the heap). By the way, try profiling this last program (and bump the count up until it runs long enough to generate a few seconds of profile data). You'll see that it uses the same amount of PINNED memory as your program, and the amount doesn't really change with the upper limit. So, this is just runtime system overhead.
Back to your example... If you are concerned about performance, the culprit is probably System.Random. This is an EXTREMELY slow random number generator. If I run your program with n = 10000000, it takes 4secs. If I replace the random number generator with a simple LCG:
randoms :: Word32 -> [Word32]
randoms seed = tail $ iterate lcg seed
where lcg x = (a * x + c)
a = 1664525
c = 1013904223
it runs in 0.2secs, so 20 times faster.

GHC - turning iterate into a tight loop

I'm trying to learn/evaluate Haskell and I'm struggling with getting efficient executable for a simple case.
The test I'm using is a PRNG sequence (replicating PCG32 RNG). I've written it as an iteration of a basic state transition function (I'm looking only at the state for now).
{-# LANGUAGE BangPatterns #-}
import System.Environment (getArgs)
import Data.Bits
import Data.Word
iterate' f !x = x : iterate' f (f x)
main = print $ pcg32_rng 100000000
pcg32_random_r :: Word64 -> Word64 -> Word64
pcg32_random_r !i !state = state * (6364136223846793005 :: Word64) + (i .|. 1)
{-# INLINE pcg32_random_r #-}
pcg32_rng_s = iterate' (pcg32_random_r 1) 0
pcg32_rng n = pcg32_rng_s !! (n - 1)
I can get that code to compile and run. It still uses a lot more memory than it should and runs 10x slower than the C equivalent. The main issue seems to be that the iteration is not turned into a simple loop.
What am I missing to get GHC to produce faster / more efficient code here?
EDIT
This is the C version I compare against which captures in essence what I'm trying to achieve. I try for a fair comparison but let me know if I missed something.
#include <stdio.h>
#include <stdint.h>
int main() {
uint64_t oldstate,state;
int i;
for(i=0;i<100000000;i++) {
oldstate = state;
// Advance internal state
state = oldstate * 6364136223846793005ULL + (1|1);
}
printf("%ld\n",state);
}
I tried initally with the Prelude iterate function but this result in lazy evaluation and a stack overflow. The ìterate'`is aimed at fixing that issue.
My next step was to try to get GHC to inline pcg32_random_rand that's where I added the strictness to it but that doesn't seem to be enough. When I look at the GHC core, it is not inlined.
#WillemVanOnsem I confirm with performthe result is on par with C and actually the pcg32_random_rfunction was inlined. I'm reaching the limit of my grasp of Haskell and GHC at this stage. Can you elaborate on why perform performs better and how to decide when to use what?
Would this transformation be feasible automatically by the compiler or is it something that requires a design decision?
The reason to ask the last question is that I would like as much to separate functionality and implementation choice (speed / space tradeoffs, ...) to maximize reuse and I was hoping Haskell to help me there.
It looks to me that the issue is more that you produce a list, and obtain the i-th element from that list. As a result you are going to unfold that list function, and each time you construct a new element if you need to move further in the list.
Instead of constructing such list (which will construct new nodes, and perform memory allocations, and consume a lot of memory). You can construct a function that will perform a given function n times:
perform_n :: (a -> a) -> Int -> a -> a
perform_n !f = step
where step !n !x | n <= 0 = x
| otherwise = step (n-1) (f x)
So now we can perform a function f n times. We can thus rewrite it like:
pcg32_rng n = perform_n (pcg32_random_r 1) (n-1) 0
If I compile this with ghc -O2 file.hs (GHC 8.0.2) run this file with time, I get:
$ time ./file
2264354473547460187
0.14user 0.00system 0:00.14elapsed 99%CPU (0avgtext+0avgdata 3408maxresident)k
0inputs+0outputs (0major+161minor)pagefaults 0swaps
the original file produces the following benchmarks:
$ time ./file2
2264354473547460187
0.54user 0.00system 0:00.55elapsed 99%CPU (0avgtext+0avgdata 3912maxresident)k
0inputs+0outputs (0major+287minor)pagefaults 0swaps
EDIT:
As #WillNess says, if you do not name the list, at runtime the list will be garbage collected: if you process through a list, and do not keep a reference to the head of the list, then that head can be removed once we step over it.
If we however construct a file like:
{-# LANGUAGE BangPatterns #-}
import System.Environment (getArgs)
import Data.Bits
import Data.Word
iterate' f !x = x : iterate' f (f x)
main = print $ pcg32_rng 100000000
pcg32_random_r :: Word64 -> Word64 -> Word64
pcg32_random_r !i !state = state * (6364136223846793005 :: Word64) + (i .|. 1)
{-# INLINE pcg32_random_r #-}
pcg32_rng n = iterate' (pcg32_random_r 1) 0 !! (n - 1)
we obtain:
$ time ./speedtest3
2264354473547460187
0.54user 0.01system 0:00.56elapsed 99%CPU (0avgtext+0avgdata 3908maxresident)k
0inputs+0outputs (0major+291minor)pagefaults 0swaps
although the memory burden can be reduced, there is little impact on time. The reason is probably that working with list elements creates cons objects. So we do a lot of packing and unpacking into lists. This also results in constructing a lot of objects (and memory allocations) which still produces overhead.

Project Euler 43: my attempt produces no output

I'm currently in the process of solving euler problems to improve in Haskell. Though, my attempt at solving problem n° 43 produces no output.
Just to be clear, I'm not asking for help on the "problem algorithmic" part, even if I'm wrong. I'm specifically asking for help on the Haskell part.
So, I divided my attempt into simple functions. First I build a list holding all 0-9pandigital numbers, then I define functions to cut those numbers into the interesting part and finally I filter only the correct ones:
import Data.List
main = print $ foldl1 (+) goodOnes
pands = [read x :: Integer | x <- permutations "0123456789", head x /= '0']
cut3from :: Integer -> Integer -> Integer
cut3from x n = mod (div x nd) 1000
where
l = fromIntegral $ length $ show x :: Integer
nd = 10 ^ (l - 3 - n)
cut :: Integer -> [Integer]
cut x = map (cut3from x) [1..7]
testDiv :: Integral a => [a] -> Bool
testDiv l = and zipped
where
zipped = zipWith mult l [2, 3, 5, 7, 11, 13, 17]
mult :: Integral a => a -> a -> Bool
mult a b = mod a b == 0
goodOnes = filter (testDiv.cut) pands
Though, when compiling it (with -O2) and executing it, it outputs nothing. Even with +RTS -s.
I'd like advice on two points mainly:
why is this code wrong, how to improve it
how could I have debugged it myself
as a side point, if you have advice on how to handle Integer and Ints easily, please post them. I find it troublesome to use both together.
But any other remark is welcome!
EDIT: it seems that GHCi slowly builds the result list goodOnes and is able to answer after a long time (only in GHCi, when compiled it's still as reported). That's certainly not a behaviour I can wish to observ in my programs. How could I fix that ?
EDIT2: it now works when compiled too (code unchanged). I'm puzzled about all those inconsistencies and would welcome any explanation!
EDIT3: semihappy ending: after a reboot, all went back to normal ~~
To answer "how could I have debugged it myself":
Check out ghci.
prompt$ ghci
GHCi, version 6.12.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package ffi-1.0 ... linking ... done.
Prelude>
It's an interactive interpreter that allows you to:
load files and play with the contents
Prelude> :load myfile.hs
[1 of 1] Compiling Main ( myfile.hs, interpreted )
Ok, modules loaded: Main.
*Main> xyz "abc"
*Main> 3
type in definitions and use them
*Main> let f x = x + 3
evaluate expressions and see the results
*Main> f 14
17
inspect types and kinds
*Main> :t f
f :: (Num a) => a -> a
*Main> :k Maybe
Maybe :: * -> *
Make sure you test each of the little pieces before you test the whole shebang -- it's easier to find problems in small things than in big ones. Check out QuickCheck if you're into unit tests.
For the Ints vs. Integers issue, you could sidestep the issue by only using Integers (of course, they may be less efficient: YMMV). Data.List has functions prefixed with generic, i.e. genericLength which are quite useful.
Here's how I compiled and ran it:
prompt$ ghc euler43.hs
prompt$ ./a.out
<some number is printed out>

Haskell: can't use getCPUTime

I have:
main :: IO ()
main = do
iniciofibonaccimap <- getCPUTime
let fibonaccimap = map fib listaVintesete
fimfibonaccimap <- getCPUTime
let difffibonaccimap = (fromIntegral (fimfibonaccimap - iniciofibonaccimap)) / (10^12)
printf "Computation time fibonaccimap: %0.3f sec\n" (difffibonaccimap :: Double)
listaVintesete :: [Integer]
listaVintesete = replicate 100 27
fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
But
*Main> main
Computation time fibonaccimap: 0.000 sec
I do not understand why this happens.
Help-me thanks.
As others have said, this is due to lazy evaluation. To force evaluation you should use the deepseq package and BangPatterns:
{-# LANGUAGE BangPatterns #-}
import Control.DeepSeq
import Text.Printf
import System.CPUTime
main :: IO ()
main = do
iniciofibonaccimap <- getCPUTime
let !fibonaccimap = rnf $ map fib listaVintesete
fimfibonaccimap <- getCPUTime
let difffibonaccimap = (fromIntegral (fimfibonaccimap - iniciofibonaccimap)) / (10^12)
printf "Computation time fibonaccimap: %0.3f sec\n" (difffibonaccimap :: Double)
...
In the above code you should notice three things:
It compiles (modulo the ... of functions you defined above). When you post code for questions please make sure it runs (iow, you should include imports)
The use of rnf from deepseq. This forces the evaluation of each element in the list.
The bang pattern on !fibonaccimap, meaning "do this now, don't wait". This forces the list to be evaluated to weak-head normal form (whnf, basically just the first constructor (:)). Without this the rnf function would itself remain unevaluated.
Resulting in:
$ ghc --make ds.hs
$ ./ds
Computation time fibonaccimap: 6.603 sec
If you're intending to do benchmarking you should also use optimization (-O2) and the Criterion package instead of getCPUTime.
Haskell is lazy. The computation you request in the line
let fibonaccimap = map fib listaVintesete
doesn't actually happen until you somehow use the value of fibonaccimap. Thus to measure the time used, you'll need to introduce something that will force the program to perform the actual computation.
ETA: I originally suggested printing the last element to force evaluation. As TomMD points out, this is nowhere near good enough -- I strongly recommend reading his response here for an actually working way to deal with this particular piece of code.
I suspect you are a "victim" of lazy evaluation. Nothing forces the evaluation of fibonaccimap between the timing calls, so it's not computed.
Edit
I suspect you're trying to benchmark your code, and in that case it should be pointed out that there are better ways to do this more reliably.
10^12 is an integer, which forces the value of fromIntegral to be an integer, which means difffibonaccimap is assigned a rounded value, so it's 0 if the time is less than half a second. (That's my guess, anyway. I don't have time to look into it.)
Lazy evaluation has in fact bitten you, as the other answers have said. Specifically, 'let' doesn't force the evaluation of an expression, it just scopes a variable. The computation won't actually happen until its value is demanded by something, which probably won't happen until an actual IO action needs its value. So you need to put your print statement between your getCPUTime evaluations. Of course, this will also get the CPU time used by print in there, but most of print's time is waiting on IO. (Terminals are slow.)

Resources