How do I optimize numerical integration performance in Haskell (with example) - haskell
How do I optimize numerical integration routine (comparing to C)?
What has been done to the moment:
I replaced lists with unboxed vectors (obvious).
I applied profiling techniques described in the book "Read World Haskell" http://book.realworldhaskell.org/read/profiling-and-optimization.html.
I have inlined some trivial functions and inserted a lot of bangs everywhere.
That gave about 10x speedup.
I refactored the code (i.e. extracted iterator function). That gave 3x speedup.
I tried to replace polymorphic signatures with Floats
as in the answer to this question
Optimizing numerical array performance in Haskell.
That gave almost 2x speedup.
I compile like this
cabal exec ghc -- Simul.hs -O2 -fforce-recomp -fllvm -Wall
UPDATE As suggested by cchalmers, type Sample = (F, F) was replaced with
data Sample = Sample {-# UNPACK #-} !F {-# UNPACK #-} !F
The performance now is almost as good as C code. Can we do better?
{-# LANGUAGE BangPatterns #-}
module Main
where
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Unboxed.Mutable as UM
import qualified Control.Monad.Primitive as PrimitiveM
import Dynamics.Nonlin ( birefrP )
type F = Float
type Delay = U.Vector F
type Input = U.Vector F
-- Sample can be a vector of any length (x, y, z, ...)
data Sample = Sample {-# UNPACK #-} !F {-# UNPACK #-} !F
-- Pair is used to define exactly a pair of values
data Pair = Pair {-# UNPACK #-} !F {-# UNPACK #-} !F
type ParametrizedDelayFunction = (Sample, F) -> Sample
getX :: Sample -> F
getX (Sample a _) = a
{-# INLINE getX #-}
toDelay :: [F] -> Delay
toDelay = U.fromList
stepsPerNode :: Int
stepsPerNode = 40 -- Number of integration steps per node
infixl 6 ..+..
(..+..) :: Sample -> Sample -> Sample
(..+..) (Sample x1 y1) (Sample x2 y2) = Sample (x1 + x2) (y1 + y2)
{-# INLINE (..+..) #-}
infixl 7 .*..
(.*..) :: F -> Sample -> Sample
(.*..) c (Sample x2 y2) = Sample (c * x2) (c * y2)
{-# INLINE (.*..) #-}
-- | Ikeda model (dynamical system, DDE)
ikeda_model2
:: (F -> F) -> (Sample, F) -> Sample
ikeda_model2 f (!(Sample x y), !x_h) = Sample x' y'
where
! x' = recip_epsilon * (-x + (f x_h))
y' = 0
recip_epsilon = 2^(6 :: Int)
-- | Integrate using improved Euler's method (fixed step).
--
-- hOver2 is already half of step size h
-- f is the function to integrate
-- x_i is current argument (x and y)
-- x_h is historical (delayed) value
-- x_h2 it the value after x_h
heun2 :: F -> ParametrizedDelayFunction
-> Sample -> Pair -> Sample
heun2 hOver2 f !x !(Pair x_h x_h2) = x_1
where
! f1 = f (x, x_h)
! x_1' = x ..+.. 2 * hOver2 .*.. f1
! f2 = f (x_1', x_h2)
! x_1 = x ..+.. hOver2 .*.. (f1 ..+.. f2)
initialCond :: Int -> (Sample, Delay, Int)
initialCond nodesN = (initialSampleXY, initialInterval, samplesPerDelay)
where cdi = 1.1247695e-4 :: F -- A fixed point for birefrP
initialInterval = U.replicate samplesPerDelay cdi
samplesPerDelay = nodesN * stepsPerNode
initialSampleXY = Sample 0.0 0.0
integrator
:: PrimitiveM.PrimMonad m =>
(Sample -> Pair -> Sample)
-> Int
-> Int
-> (Sample, (Delay, Input))
-> m (Sample, U.Vector F)
integrator iterate1 len total (xy0, (history0, input)) = do
! v <- UM.new total
go v 0 xy0
history <- U.unsafeFreeze v
-- Zero y value, currently not used
let xy = Sample (history `U.unsafeIndex` (total - 1)) 0.0
return (xy, history)
where
h i = history0 `U.unsafeIndex` i
go !v !i !xy
-- The first iteration
| i == 0 = do
let !r = iterate1 xy (Pair (h 0) (h 1))
UM.unsafeWrite v i (getX r)
go v 1 r
| i < len - 1 = do
let !r = iterate1 xy (Pair (h i) (h $ i + 1))
UM.unsafeWrite v i (getX r)
go v (i + 1) r
| i == total = do
return ()
-- Iterations after the initial history has been exhausted
| otherwise = do
! newX0 <- if i == len - 1
then return (getX xy0)
else UM.unsafeRead v (i - len - 1)
! newX <- UM.unsafeRead v (i - len)
let !r = iterate1 xy (Pair newX0 newX)
UM.unsafeWrite v i (getX r)
go v (i + 1) r
-- Not used in this version
zero :: Input
zero = U.fromList []
nodes :: Int
nodes = 306
main :: IO ()
main = do
let delays = 4000
(sample0, hist0, delayLength) = initialCond nodes
-- Iterator implements Heun's schema
iterator = heun2 (recip 2^(7::Int) :: F) (ikeda_model2 birefrP)
totalComputedIterations = delayLength * delays
-- Calculates all the time trace
(xy1, history1) <- integrator iterator delayLength totalComputedIterations (sample0, (hist0, zero))
putStrLn $ show $ getX xy1
return ()
The nonlinear function (imported) can look like this:
data Parameters = Parameters { beta :: Float
, alpha :: Float
, phi :: Float } deriving Show
paramA :: Parameters
paramA = Parameters { beta = 1.1
, alpha = 1.0
, phi = 0.01 }
birefr :: Parameters -> Float -> Float
birefr par !x = 0.5 * beta' * (1 - alpha' * (cos $ 2.0 * (x + phi')))
where
! beta' = beta par
! alpha' = alpha par
! phi' = phi par
birefrP :: Float -> Float
birefrP = birefr paramA
Related
Haskell: RandomGen drops half of values
I am writing a simple deterministic Random Number generator, based on the xorshift. The goal here is not to get a cryptographically secure or statistically perfect (pseudo-)random number generator, but to be able to archieve the same deterministic sequence of semi-random numbers across programming languages. My Haskell program looks like follows: {-# LANGUAGE GeneralizedNewtypeDeriving #-} module SimpleRNG where import Data.Word (Word32) import Data.Bits (xor, shift) import System.Random (RandomGen(..)) import Control.Arrow (|>) :: a -> (a -> b) -> b (|>) x f = f x infixl 0 |> newtype SeedState = SeedState Word32 deriving (Eq, Show, Enum, Bounded) seed :: Integral a => a -> SeedState seed = SeedState . fromIntegral rand_r :: SeedState -> (Word32, SeedState) rand_r (SeedState num) = (res, SeedState res) where res = num |> xorshift 13 |> xorshift (-17) |> xorshift 5 xorshift :: Int -> Word32 -> Word32 xorshift amount x = x `xor` (shift x amount) instance RandomGen SeedState where next seed_state = (first fromIntegral) $ rand_r seed_state where genRange seed_state = (fromEnum (minBound `asTypeOf` seed_state), fromEnum (maxBound `asTypeOf` seed_state)) split seed_state#(SeedState num) = (seed_state', inverted_seed_state') where (_, seed_state') = next seed_state (_, inverted_seed_state') = next inverted_seed_state inverted_seed_state = SeedState (maxBound - num) Now, for some reason, when running take 10 $ System.Random.randoms (seed 42) :: [Word32] it returns only the 'odd' results, compared to the output of the following Python program: class SeedState(object): def __init__(self, seed = 42): self.data = seed def rand_r(rng_state): num = rng_state.data num ^= (num << 13) % (2 ** 32) num ^= (num >> 17) % (2 ** 32) num ^= (num << 5) % (2 ** 32) rng_state.data = num return num __global_rng_state = SeedState(42) def rand(): global __global_rng_state return rand_r(__global_rng_state) def seed(seed): global __global_rng_state __global_rng_state = SeedState(seed) if __name__ == '__main__': for x in range(0, 10): print(rand()) It seems like the internals of the System.Random module do some weird trickery with the return result of the generator (calling map fst $ take 10 $ iterate (\(_, rng) -> rand_r rng) (rand_r $ seed 42) gives the result I'd expect). This is odd, since the type returned by the generator is already a Word32, so it could/should just be passed on unaltered without any remapping happening. What is going on here, and is there a way to plug this xorshift-generator into System.Random in a way that returns the same results?
This is to do with the behaviour of System.Random.randoms, which repeatedly applies random to a RandomGen, not next. class Random a where ... random :: (RandomGen g) => g -> (a, g) The Random class is what allows you to reuse RandomGen instances across different enums, and the instance for Word32 (as well as nearly all other types) is defined as instance Random Word32 where randomR = randomIvalIntegral; random = randomBounded randomBounded just calls randomR, so the behaviour of random is decided by ` randomIvalIntegral (l,h) = randomIvalInteger (toInteger l, toInteger h) randomIvalInteger is an interesting function, you can read the source here. It's actually causing your problem because the function will discard a certain number of intermediate values based on the range of the generator and the range being generated over. To get the values you want, you just need to use next instead - the easiest way would just be to define randoms' g = x : (randoms' g') where (x, g') = next g
How to "extend" a (partial) record selector?
NB: The type I define below is just a convenient example for the purposes of this question; I'm sure there's no any need for me to roll my own definition of complex numbers in Haskell. I don't know if I'm using the right terminology here, but the selector r below is ane example of what I mean by a "partial" record selector: data Complex = Polar { r :: Float, y :: Float } | Rectangular { x :: Float, y :: Float } deriving Show r is "partial" because it cannot be applied to all Complex values; e.g. r $ Polar 3 0 -- 3.0 ...but r $ Rectangular 3 0 -- *** Exception: No match in record selector r In this case, however, there's a sensible definition for r $ Rectangular x y, namely: -- assuming {-# LANGUAGE RecordWildCards #-} r :: Complex -> Float r Rectangular { .. } = sqrt $ (x * x) + (y * y) GHCi rejects this definition of r, with a multiple declarations of ‘r’ error. Is there a way to extend r so that it can be applied to any Complex value? Of course, I realize that I can define something like -- assuming {-# LANGUAGE RecordWildCards #-} modulus :: Complex -> Float modulus Polar { .. } = r modulus Rectangular { .. } = sqrt $ (x * x) + (y * y) ...but I want to know if it is possible to extend the already existing selector r.
No, and IMO such record selectors should never be introduced in the first place. I'd write this as type ℝ = Float -- Note that Double is usually more sensible newtype S¹ = S¹ {ϑ :: ℝ} -- in [-π, π[ newtype ℝPlus = ℝPlus {posℝ :: ℝ} -- in [0, ∞[ data Complex = Polar ℝPlus S¹ | Rectangular ℝ ℝ deriving Show This way, there is no error potential in form of partial record selectors, and also no confusion what to unpack etc.. Even for such a “non-record type”, you can write your own accessors, preferrably in lens form: import Control.Lens r :: Lens' Complex ℝPlus r = lens get set where get (Polar r _) = r get (Rectangular x y) = ℝPlus . sqrt $ x^2 + y^2 set (Polar _ θ) r = Polar r θ set (Rectangular x y) (ℝPlus r) = Rectangular (x * η) (y * η) where η = r / sqrt (x^2 + y^2)
How to define a parameterized similarity class (an ==-like operator with 3rd param) in Haskell?
How to derive a parameterized similarity in a way that it would be convenient to use in Haskell? The class should be such that the domain can be numeric or text (and possibly something else), and the parameter controlling the internals of comparison function can also be of several types. Below, you may find the one approach that uses two class parameters. What implications this design entails if the goal is to define several "similarity or equality groups"? (What kind of use cases would be hard to implement compared to some alternative implementation?) In this example, the similarity groups of words could be defined to be edit distances of one, two etc. and in double to be different precisions. Some of the methods take both numeric and textual inputs like the "quiteSimilar"-method. Why not use just some distance? Some of the similarities should be able to be defined by the user of the parameterized equality, e.g. on text (words) they could be based on synonyms. And on doubles, well, I don't know yet, what kind of comparisons will be needed. (Suggestions are welcome.) After equalities comes the question, how to compare the order of items so that similar items will be deemed to be equal and not the larger and smaller, see the last line of the output. {-# LANGUAGE MultiParamTypeClasses #-} import Data.Array import qualified Data.Text as T -- parameterized eq class Peq a b where peq :: a -> b -> b -> Bool instance Peq Double Double where peq = almostEqRelPrec instance Peq Int T.Text where peq = editDistance class Comment a where quiteSimilar :: a -> a -> T.Text instance Comment Double where quiteSimilar a b = if peq (epsilon * 100::Double) a b then T.pack "alike" else T.pack "unalike" instance Comment T.Text where quiteSimilar a b = if peq (1::Int) a b then T.pack "alike" else T.pack "unalike" x1' x = quiteSimilar 0.25 (0.25 - x * epsilon :: Double) x1 = quiteSimilar 0.25 (0.25 - 25 * epsilon :: Double) x2 = quiteSimilar 0.25 (0.25 - 26 * epsilon :: Double) x3' x = quiteSimilar 1e12 (1e12 - x * ulp 1e12 :: Double) x3 = quiteSimilar 1e12 (1e12 - 181 * ulp 1e12 :: Double) x4 = quiteSimilar 1e12 (1e12 - 182 * ulp 1e12 :: Double) u181 = 181 * ulp 1e12 :: Double main = do let a = 0.2 + 0.65 :: Double b = 0.85 :: Double s = T.pack "trial" t = T.pack "tr1al" putStrLn $ "0.2 + 0.65 = " ++ show a ++ " and compared to " ++ show b ++ ", it is " ++ T.unpack (quiteSimilar a b) putStrLn $ "Texts " ++ T.unpack s ++ " and " ++ T.unpack t ++ " are " ++ T.unpack (quiteSimilar s t) putStrLn $ "Note that " ++ show a ++ " > " ++ show b ++ " is " ++ show (a > b) -- packege Numeric.Limits contains this one epsilon :: RealFloat a => a epsilon = r where r = 1 - encodeFloat (m-1) e (m, e) = decodeFloat (1 `asTypeOf` r) ulp :: RealFloat a => a -> a ulp a = r where r = a - encodeFloat (m-1) e (m, e) = decodeFloat (a `asTypeOf` r) almostEqRelPrec :: (RealFloat a) => a -> a -> a -> Bool almostEqRelPrec maxRelPrec a b = d <= (largest * maxRelPrec) where d = abs $ a - b largest = max (abs a) (abs b) editDistance :: Int -> T.Text -> T.Text -> Bool editDistance i a b = i == editDistance' (show a) (show b) -- from https://wiki.haskell.org/Edit_distance -- see also https://hackage.haskell.org/package/edit-distance-0.2.2.1 editDistance' :: Eq a => [a] -> [a] -> Int editDistance' xs ys = table ! (m,n) where (m,n) = (length xs, length ys) x = array (1,m) (zip [1..] xs) y = array (1,n) (zip [1..] ys) table :: Array (Int,Int) Int table = array bnds [(ij, dist ij) | ij <- range bnds] bnds = ((0,0),(m,n)) dist (0,j) = j dist (i,0) = i dist (i,j) = minimum [table ! (i-1,j) + 1, table ! (i,j-1) + 1, if x ! i == y ! j then table ! (i-1,j-1) else 1 + table ! (i-1,j-1)] On my machine, the output is: 0.2 + 0.65 = 0.8500000000000001 and compared to 0.85, it is alike Texts trial and tr1al are alike Note that 0.8500000000000001 > 0.85 is True Edit: Trying to rephrase the question: could this be achieved more elegantly with a similarity class that has only one parameter a and not two (a and b)? I have a feeling that multiparameter classes may turn out to be difficult later on. Is this a needless fear? First solution along this line that came to my mind was to define similarity class with one parameter a and a class for functions having two parameters. And on instances constraint other type to be similarity class parameter and the other would be for actual method returning Bool. Are there some benefits of using the latter approach to the one presented? Or actually what are the possible trade-offs between these approaches? And if there are still more ways to make achieve this kind of things, how do they compare?
could this be achieved more elegantly with a similarity class that has only one parameter a and not two (a and b) Yes. Many MultiParamTypeClasses can be rewritten quite easily to single-param ones... by simply degrading the second parameter to an associated type family: {-# LANGUAGE TypeFamilies #-} class Peq b where type SimilarityThreshold b :: * peq :: SimilarityThreshold b -> b -> b -> Bool instance Peq Double where type SimilarityThreshold Double = Double peq = almostEqRelPrec instance Peq T.Text where type SimilarityThreshold T.Text = Int peq = editDistance This is quite a bit more verbose, but indeed I tend to favour this style. The main difference is that the associated type family always assigng each type of values to be compared unambiguously a threshold-type. This can save you some could not deduce... type inference trouble, however it also means that you can't use two different metric-types for a single type (but why would you, anyway). Note that you can achieve exactly the same semantics by simply adding a fundep to your original class: {-# LANGUAGE FunctionalDependencies #-} class Peq a b | b -> a where peq :: a -> b -> b -> Bool This is just a bit different in usage – again I tend to favour the type families approach: it is more explicit in what the parameters are for, while at the same time avoiding the second parameter to turn up in the constraints to any Peq-polymorphic function.
Haskell: Implementing a design with an interface and a polymorphic function
Again I'm requesting comments about how a given design should be implemented in Haskell. Thanks in advance to everyone providing helpful comments. Also I hope this could be an aid to other Haskell novices like me, having a practical sample code. This time, we have a polymorphic function doSampling (in module Samples) that takes a generic function f and a list of reals (indexes) and returns a Samples (indexes, values=f(indexes)). We want implement doSampling only once, as it doesn't matter if is f is a Polynomial or a Sinus. For that, we have introduced an interface Function, and have Polynomial and Sinus types implement it. The following is the design being implemented: Edit 1: There is a debate on the Function interface (class in Haskell). It has been suggested it is not actually necessary, as doSampling may take a "nude" function (Double -> Double) instead. But, how to do it, if you need some extra state within the nude function (coeffs for a polynomial, amp+freq+phase for a sinus? Edit 2: Very good answers by kosmikus and by Chris Taylor. Thanks. A key idea in both: have doSampling :: (Double -> Double) -> [Double] -> Samples This is: it takes a function (Double -> Double) (instead of Function) and list and returns samples. My intention was to keep the state of Polynomials and Sinuses. That is not regarded in Chris answer, but it is in kosmikus'. On the other hand, the weak point in kosmikus version could be how to extend its Function definition if you don't have access to the source code. I would also point out: Chris' idea of encapsulating a polynomial or a sinus into a function (Double -> Double) by means of a factory function mkPolynomial or mkSinus that generates (using currying?) the desired function taking the apropriate parameters. (Although you can't consult the parameters later). kosmikous' idea of using value to transform (also using currying?) a Function into a (Double -> Double) Both answers are worth reading as they have other little Haskell tricks to reduce and simplify code. In sum Chris answers does not support keeping the state of a Polynomial or of a Sinus kosmikus answers is not extensible: adding new type of functions (Cosinus ...) my answer (being verbose) does overcome the previous downsides, and it would allow (this not necessary for the problem) impose Function types to have more associated-functions apart of value (in the sense of how an java-interfaces work). My own approach main (usage) import Polynomial import Sinus import Function import Samples -- ............................................................... p1 = Polynomial [1, 0, 0.5] -- p(x) = 1 + 0.5x^2 s1 = Sinus 2 0.5 3 -- f(x) = 2 sin(0.5x + 3) -- ............................................................... -- sample p1 from 0 to 5 m1 = doSampling p1 [0, 0.5 .. 5] m2 = doSampling s1 [0, 0.5 .. 5] -- ............................................................... -- main -- ............................................................... main = do putStrLn "Hello" print $ value p1 2 print $ value s1 (pi/2) print $ pairs m1 print $ pairs m2 Function module Function where -- ............................................................... -- "class type" : the types belonging to this family of types -- must implement the following functions: -- + value : takes a function and a real and returns a real -- ............................................................... class Function f where value :: f -> Double -> Double -- f is a type variable, this is: -- f is a type of the Function "family" not an actual function Samples module Samples where import Function -- ............................................................... -- Samples: new data type -- This is the constructor and says it requieres -- two list, one for the indexes (xs values) and another -- for the values ( ys = f (xs) ) -- this constructor should not be used, instead use -- the "factory" function: new_Samples that performs some checks -- ............................................................... data Samples = Samples { indexes :: [Double] , values :: [Double] } deriving (Show) -- ............................................................... -- constructor: it checks lists are equal size, and indexes are sorted new_Samples :: [Double] -> [Double] -> Samples new_Samples ind val | (length ind) /= (length val) = samplesVoid | not $ isSorted ind = samplesVoid | otherwise = Samples ind val -- ............................................................... -- sample a funcion -- it takes a funcion f and a list of indexes and returns -- a Samples calculating the values array as f(indexes) doSampling :: (Function f) => f -> [Double] -> Samples doSampling f ind = new_Samples ind vals where vals = [ value f x | x <- ind ] -- ............................................................... -- used as "error" in the construction samplesVoid = Samples [] [] -- ............................................................... size :: Samples -> Int size samples = length (indexes samples) -- ............................................................... -- utility function to get a pair (index,value) out of a Samples pairs :: Samples -> [(Double, Double)] pairs samples = pairs' (indexes samples) (values samples) pairs' :: [Double] -> [Double] -> [(Double, Double)] pairs' [] [] = [] pairs' [i] [v] = [(i,v)] pairs' (i:is) (v:vs) = (i,v) : pairs' is vs -- ............................................................... -- to check whether a list is sorted (<) isSorted :: (Ord t) => [t] -> Bool isSorted [] = True isSorted [e] = True isSorted (e1:(e2:tail)) | e1 < e2 = isSorted (e2:tail) | otherwise = False Sinus module Sinus where -- ............................................................... import Function -- ............................................................... -- Sinus: new data type -- This is the constructor and says it requieres -- a three reals -- ............................................................... data Sinus = Sinus { amplitude :: Double, frequency :: Double, phase :: Double } deriving (Show) -- ............................................................... -- we say that a Sinus is a Function (member of the class Function) -- and then, how value is implemented instance Function Sinus where value s x = (amplitude s) * sin ( (frequency s)*x + (phase s)) Polynomial module Polynomial where -- ............................................................... import Function -- ............................................................... -- Polynomial: new data type -- This is the constructor and says it requieres -- a list of coefficients -- ............................................................... data Polynomial = Polynomial { coeffs :: [Double] } deriving (Show) -- ............................................................... degree :: Polynomial -> Int degree p = length (coeffs p) - 1 -- ............................................................... -- we say that a Polynomial is a Function (member of the class Function) -- and then, how value is implemented instance Function Polynomial where value p x = value' (coeffs p) x 1 -- list of coeffs -> x -> pw (power of x) -> Double value' :: [Double] -> Double -> Double -> Double value' (c:[]) _ pw = c * pw value' (c:cs) x pw = (c * pw) + (value' cs x x*pw)
You certainly don't need the Function class. All this heavyweight class, instance, member variable fluff is one of the things that Haskell is designed to avoid. Pure functions can be much more flexible. Here's a simple way of doing what you want. type Sample = ([Double], [Double]) newSample xs vs | isSorted xs && length xs == length vs = (indices, values) | otherwise = ([], []) pairs = uncurry zip doSampling :: (Double -> Double) -> [Double] -> Sample doSampling f xs = newSample xs (map f xs) mkPolynomial :: [Double] -> (Double -> Double) mkPolynomial coefs x = go coefs where go [] = 0 go (c:cs) = c + x * go cs mkSinus :: Double -> Double -> Double -> (Double -> Double) mkSinus amp freq phase x = amp * sin (freq * x + phase) p1 = mkPolynomial [1, 0, 0.5] -- 1 + 0.5x^2 s1 = mkSinus 2 0.5 3 -- 2 sin(0.5x + 3) m1 = doSampling p1 [0, 0.5 .. 5] m2 = doSampling s1 [0, 0.5 .. 5] main :: IO () main = do print $ p1 2 print $ s1 (pi/2) print $ pairs m1 print $ pairs m2
[Expanded my comment on request.] I'd probably do this roughly as follows: import Data.Functor -- Use a datatype rather than a class. Yes, this makes it harder to -- add new types of functions later, and in turn easier to define new -- operations. ("expression problem") data Function = Sinus { amplitude :: Double, frequency :: Double, phase :: Double } | Polynomial { coeffs :: [Double] } deriving (Show) -- Interpreting a Function as an actual function. value :: Function -> (Double -> Double) value (Sinus amp freq ph) x = amp * sin (freq * x + ph) value (Polynomial cs) x = value' cs x -- Rewrite value' to not require non-empty lists. This can also be -- nicely written as a fold. value' :: [Double] -> Double -> Double value' [] _ = 0 value' (c:cs) x = c + x * value' cs x data Samples = Samples { indexes :: [Double] , values :: [Double] } deriving (Show) -- Use Maybe to detect error conditions, instead of strange values -- such as voidSamples. newSamples :: [Double] -> [Double] -> Maybe Samples newSamples ind val | length ind /= length val = Nothing | not $ isSorted ind = Nothing | otherwise = Just (Samples ind val) doSampling :: (Double -> Double) -> [Double] -> Maybe Samples doSampling f ind = newSamples ind (map f ind) isSorted :: (Ord t) => [t] -> Bool isSorted [] = True isSorted [e] = True isSorted (e1:e2:es) | e1 < e2 = isSorted (e2:es) | otherwise = False -- This is just zip. pairs :: Samples -> [(Double, Double)] pairs (Samples idxs vals) = zip idxs vals p1 = Polynomial [1, 0, 0.5] -- p(x) = 1 + 0.5x^2 s1 = Sinus 2 0.5 3 -- f(x) = 2 sin(0.5x + 3) m1 = doSampling (value p1) [0, 0.5 .. 5] m2 = doSampling (value s1) [0, 0.5 .. 5] -- The <$> maps over a Maybe. main = do putStrLn "Hello" print $ value p1 2 print $ value s1 (pi/2) print $ pairs <$> m1 print $ pairs <$> m2
Optimizing Perlin noise in Haskell
(Dependencies for this program: vector --any and JuicyPixels >= 2. Code is available as Gist.) {-# LANGUAGE Haskell2010 #-} {-# LANGUAGE BangPatterns #-} import Control.Arrow import Data.Bits import Data.Vector.Unboxed ((!)) import Data.Word import System.Environment (getArgs) import qualified Codec.Picture as P import qualified Data.ByteString as B import qualified Data.Vector.Unboxed as V I tried to port Ken Perlin's improved noise to Haskell, but I'm not entirely sure that my method is correct. The main part is something that should generalize nicely to higher and lower dimensions, but that is something for later: perlin3 :: (Ord a, Num a, RealFrac a, V.Unbox a) => Permutation -> (a, a, a) -> a perlin3 p (!x', !y', !z') = let (!xX, !x) = actuallyProperFraction x' (!yY, !y) = actuallyProperFraction y' (!zZ, !z) = actuallyProperFraction z' !u = fade x !v = fade y !w = fade z !h = xX !a = next p h + yY !b = next p (h+1) + yY !aa = next p a + zZ !ab = next p (a+1) + zZ !ba = next p b + zZ !bb = next p (b+1) + zZ !aaa = next p aa !aab = next p (aa+1) !aba = next p ab !abb = next p (ab+1) !baa = next p ba !bab = next p (ba+1) !bba = next p bb !bbb = next p (bb+1) in lerp w (lerp v (lerp u (grad aaa (x, y, z)) (grad baa (x-1, y, z))) (lerp u (grad aba (x, y-1, z)) (grad bba (x-1, y-1, z)))) (lerp v (lerp u (grad aab (x, y, z-1)) (grad bab (x-1, y, z-1))) (lerp u (grad abb (x, y-1, z-1)) (grad bbb (x-1, y-1, z-1)))) This is of course accompanied by a few functions mentioned in the perlin3 function, of which I hope they are as efficient as possible: fade :: (Ord a, Num a) => a -> a fade !t | 0 <= t, t <= 1 = t * t * t * (t * (t * 6 - 15) + 10) lerp :: (Ord a, Num a) => a -> a -> a -> a lerp !t !a !b | 0 <= t, t <= 1 = a + t * (b - a) grad :: (Bits hash, Integral hash, Num a, V.Unbox a) => hash -> (a, a, a) -> a grad !hash (!x, !y, !z) = dot3 (vks `V.unsafeIndex` fromIntegral (hash .&. 15)) (x, y, z) where vks = V.fromList [ (1,1,0), (-1,1,0), (1,-1,0), (-1,-1,0) , (1,0,1), (-1,0,1), (1,0,-1), (-1,0,-1) , (0,1,1), (0,-1,1), (0,1,-1), (0,-1,-1) , (1,1,0), (-1,1,0), (0,-1,1), (0,-1,-1) ] dot3 :: Num a => (a, a, a) -> (a, a, a) -> a dot3 (!x0, !y0, !z0) (!x1, !y1, !z1) = x0 * x1 + y0 * y1 + z0 * z1 -- Unlike `properFraction`, `actuallyProperFraction` rounds as intended. actuallyProperFraction :: (RealFrac a, Integral b) => a -> (b, a) actuallyProperFraction x = let (ipart, fpart) = properFraction x r = if x >= 0 then (ipart, fpart) else (ipart-1, 1+fpart) in r For the permutation group, I just copied the one Perlin used on his website: newtype Permutation = Permutation (V.Vector Word8) mkPermutation :: [Word8] -> Permutation mkPermutation xs | length xs >= 256 = Permutation . V.fromList $ xs permutation :: Permutation permutation = mkPermutation [151,160,137,91,90,15, 131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23, 190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33, 88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166, 77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244, 102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196, 135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123, 5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42, 223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9, 129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228, 251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107, 49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254, 138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180 ] next :: Permutation -> Word8 -> Word8 next (Permutation !v) !idx' = v `V.unsafeIndex` (fromIntegral $ idx' .&. 0xFF) And all this is tied together with JuicyPixels: main = do [target] <- getArgs let image = P.generateImage pixelRenderer 512 512 P.writePng target image where pixelRenderer, pixelRenderer' :: Int -> Int -> Word8 pixelRenderer !x !y = floor $ ((perlin3 permutation ((fromIntegral x - 256) / 32, (fromIntegral y - 256) / 32, 0 :: Double))+1)/2 * 128 -- This code is much more readable, but also much slower. pixelRenderer' x y = (\w -> floor $ ((w+1)/2 * 128)) -- w should be in [-1,+1] . perlin3 permutation . (\(x,y,z) -> ((x-256)/32, (y-256)/32, (z-256)/32)) $ (fromIntegral x, fromIntegral y, 0 :: Double) My problem is that perlin3 seems very slow to me. If I profile it, pixelRenderer is getting a lot of time as well, but I'll ignore that for now. I don't know how to optimize perlin3. I tried to hint GHC with bang patterns, which cuts the execution time in half, so that's nice. Explicitly specializing and inlining barely helps with ghc -O. Is perlin3 supposed to be this slow? UPDATE: an earlier version of this question mentioned a bug in my code. This problem has been resolved; it turns out my old version of actuallyProperFraction was buggy. It implicitly rounded the integral part of a floating point number to Word8, and then subtracted it from the floating point number to get the fractional part. Since Word8 can only take values between 0 and 255 inclusive, this won't work properly for numbers outside that range, including negative numbers.
This code appears to be mostly computation-bound. It can be improved a little bit, but not by much unless there's a way to use fewer array lookups and less arithmetic. There are two useful tools for measuring performance: profiling and code dumps. I added an SCC annotation to perlin3 so that it would show up in the profile. Then I compiled with gcc -O2 -fforce-recomp -ddump-simpl -prof -auto. The -ddump-simpl flag prints the simplified code. Profiling: On my computer, it takes 0.60 seconds to run the program, and about 20% of execution time (0.12 seconds) is spent in perlin3 according to the profile. Note that the precision of my profile info is about +/-3%. Simplifier output: The simplifier produces fairly clean code. perlin3 gets inlined into pixelRenderer, so that's the part of the output you want to look at. Most of the code consists of unboxed array reads and unboxed arithmetic. To improve performance, we want to eliminate some of this arithmetic. An easy change is to eliminate the run-time checks on SomeFraction (which doesn't appear in your question, but is part of the code that you uploaded). This reduces the program's execution time to 0.56 seconds. -- someFraction t | 0 <= t, t < 1 = SomeFraction t someFraction t = SomeFraction t Next, there are several array lookups that show up in the simplifier like this: case GHC.Prim.indexWord8Array# ipv3_s23a (GHC.Prim.+# ipv1_s21N (GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.narrow8Word# (GHC.Prim.plusWord# ipv5_s256 (__word 1))) (__word 255)))) The primitive operation narrow8Word# is for coercing from an Int to a Word8. We can get rid of this coercion by using Int instead of Word8 in the definition of next. next :: Permutation -> Int -> Int next (Permutation !v) !idx' = fromIntegral $ v `V.unsafeIndex` (fromIntegral idx' .&. 0xFF) This reduces the program's execution time to 0.54 seconds. Considering just the time spent in perlin3, the execution time has fallen (roughly) from 0.12 to 0.06 seconds. Although it's hard to measure where the rest of the time is going, it's most likely spread out among the remaining arithmetic and array accesses.
On my machine reference code with Heatsink's optimisations takes 0.19 secs. Firstly, I has moved from JuicyPixels to yarr and yarr-image-io with my favourite flags, -Odph -rtsopts -threaded -fno-liberate-case -funbox-strict-fields -fexpose-all-unfoldings -funfolding-keeness-factor1000 -fsimpl-tick-factor=500 -fllvm -optlo-O3 (they are given here): import Data.Yarr as Y import Data.Yarr.IO.Image as Y ... main = do [target] <- getArgs image <- dComputeS $ fromFunction (512, 512) (return . pixelRenderer) Y.writeImage target (Grey image) where pixelRenderer, pixelRenderer' :: Dim2 -> Word8 pixelRenderer (y, x) = floor $ ((perlin3 permutation ((fromIntegral x - 256) / 32, (fromIntegral y - 256) / 32, 0 :: Double))+1)/2 * 128 -- This code is much more readable, but also much slower. pixelRenderer' (y, x) = (\w -> floor $ ((w+1)/2 * 128)) -- w should be in [-1,+1] . perlin3 permutation . (\(x,y,z) -> ((x-256)/32, (y-256)/32, (z-256)/32)) $ (fromIntegral x, fromIntegral y, 0 :: Double) This makes the program 30% faster, 0.13 seconds. Secondly I has replaced uses of standard floor with doubleToByte :: Double -> Word8 doubleToByte f = fromIntegral (truncate f :: Int) It is known issue (google "haskell floor performance"). Execution time is reduced to 52 ms (0.052 secs), in almost 3 times. Finally, just for fun I tried to compute noise in parallel (dComputeP instead of dComputeS and +RTS -N4 in command line run). Program took 36 ms, including I/O constant of about 10 ms.