Mutating data with immutable data structures

Mutating data with immutable data structures - haskell

I would like to implement a particular algorithm, but I'm having trouble finding a good data structure for the job. A simpler version of the algorithm works like the following:
Input: A set of points.
Output: A new set of points.
Step 1: For each point, calculate the closest points in a radius.
Step 2: For each point, calculate a value "v" from the closest points subset.
Step 3: For each point, calculate a new value "w" from the closest points and
the values "v" from the previous step, i.e, "w" depends on the neighbors
and "v" of each neighbor.
Step 4: Update points.
In C++, I can solve this like this:
struct Point {
Vector position;
double v, w;
std::vector<Point *> neighbors;
};
std::vector<Point> points = initializePoints();
calculateNeighbors(points);
calculateV(points); // points[0].v = value; for example.
calculateW(points);
With a naive structure such as a list of points, I cannot update the value "v" into the original set of points, and would need to calculate the neighbors twice. How can I avoid this and keep the functions pure, since calculating the neighbors is the most expensive part of the algorithm (over 30% of the time)?
PS.: For those experienced in numerical methods and CFD, this is a simplified version of the Smoothed Particle Hydrodynamics method.
Update: Changed step 3 so it is clearer.

It is a common myth that Haskell doesn't offer mutation at all. In reality, it offers a very special kind of mutation: a value can mutate exactly once, from un-evaluated to evaluated. The art of taking advantage of this special kind of mutation is called tying the knot. We will start with a data structure just like your one from C++:
data Vector -- held abstract
data Point = Point
{ position :: Vector
, v, w :: Double
, neighbors :: [Point]
}
Now, what we're going to do is build an Array Point whose neighbors contain pointers to other elements within the same array. The key features of Array in the following code are that it's spine-lazy (it doesn't force its elements too soon) and has fast random-access; you can substitute your favorite alternate data structure with these properties if you prefer.
There's lots of choices for the interface of the neighbor-finding function. For concreteness and to make my own job simple, I will assume you have a function that takes a Vector and a list of Vectors and gives the indices of neighbors.
findNeighbors :: Vector -> [Vector] -> [Int]
findNeighbors = undefined
Let's also put in place some types for computeV and computeW. For the nonce, we will ask that computeV live up to the informal contract you stated, namely, that it can look at the position and neighbors fields of any Point, but not the v or w fields. (Similarly, computeW may look at anything but the w fields of any Point it can get its hands on.) It is actually possible to enforce this at the type level without too many gymnastics, but for now let's skip that.
computeV, computeW :: Point -> Double
(computeV, computeW) = undefined
Now we are ready to build our (labeled) in-memory graph.
buildGraph :: [Vector] -> Array Int Point
buildGraph vs = answer where
answer = listArray (0, length vs-1) [point pos | pos <- vs]
point pos = this where
this = Point
{ position = pos
, v = computeV this
, w = computeW this
, neighbors = map (answer!) (findNeighbors pos vs)
}
And that's it, really. Now you can write your
newPositions :: Point -> [Vector]
newPositions = undefined
where newPositions is perfectly free to inspect any of the fields of the Point it's handed, and put all the functions together:
update :: [Vector] -> [Vector]
update = newPositions <=< elems . buildGraph
edit: ...to explain the "special kind of mutation" comment at the beginning: during evaluation, you can expect when you demand the w field of a Point that things will happen in this order: computeW will force the v field; then computeV will force the neighbors field; then the neighbors field will mutate from unevaluated to evaluated; then the v field will mutate from unevaluated to evaluated; then the w field will mutate from unevaluated to evaluated. These last three steps look very similar to the three mutation steps of your C++ algorithm!
double edit: I decided I wanted to see this thing run, so I instantiated all the things held abstract above with dummy implementations. I also wanted to see it evaluate things only once, since I wasn't even sure I'd done it right! So I threw in some trace calls. Here's a complete file:
import Control.Monad
import Data.Array
import Debug.Trace
announce s (Vector pos) = trace $ "computing " ++ s ++ " for position " ++ show pos
data Vector = Vector Double deriving Show
data Point = Point
{ position :: Vector
, v, w :: Double
, neighbors :: [Point]
}
findNeighbors :: Vector -> [Vector] -> [Int]
findNeighbors (Vector n) vs = [i | (i, Vector n') <- zip [0..] vs, abs (n - n') < 1]
computeV, computeW :: Point -> Double
computeV (Point pos _ _ neighbors) = sum [n | Point { position = Vector n } <- neighbors]
computeW (Point pos v _ neighbors) = sum [v | Point { v = v } <- neighbors]
buildGraph :: [Vector] -> Array Int Point
buildGraph vs = answer where
answer = listArray (0, length vs-1) [point pos | pos <- vs]
point pos = this where { this = Point
{ position = announce "position" pos $ pos
, v = announce "v" pos $ computeV this
, w = announce "w" pos $ computeW this
, neighbors = announce "neighbors" pos $ map (answer!) (findNeighbors pos vs)
} }
newPositions :: Point -> [Vector]
newPositions (Point { position = Vector n, v = v, w = w }) = [Vector (n*v), Vector w]
update :: [Vector] -> [Vector]
update = newPositions <=< elems . buildGraph
and a run in ghci:
*Main> length . show . update . map Vector $ [0, 0.25, 0.75, 1.25, 35]
computing position for position 0.0
computing v for position 0.0
computing neighbors for position 0.0
computing position for position 0.25
computing position for position 0.75
computing w for position 0.0
computing v for position 0.25
computing neighbors for position 0.25
computing v for position 0.75
computing neighbors for position 0.75
computing position for position 1.25
computing w for position 0.25
computing w for position 0.75
computing v for position 1.25
computing neighbors for position 1.25
computing w for position 1.25
computing position for position 35.0
computing v for position 35.0
computing neighbors for position 35.0
computing w for position 35.0
123
As you can see, each field is computed at most once for each position.

Can you do something like this? Given the following type signatures
calculateNeighbours :: [Point] -> [[Point]]
calculateV :: [Point] -> Double
calculateW :: [Point] -> Double -> Double
you can write
algorithm :: [Point] -> [(Point, Double, Double)]
algorithm pts = -- pts :: [Point]
let nbrs = calculateNeighbours pts -- nbrs :: [[Point]]
vs = map calculateV nbrs -- vs :: [Double]
ws = zipWith calculateW nbrs vs -- ws :: [Double]
in zip3 pts vs ws -- :: [(Point,Double,Double)]
This calculates the lists of neighbours only once, and re-uses the value in the computations for v and w.
If this isn't what you want, can you elaborate a little more?

I think you should either use Map (HashMap) to separately store v's (and w's) counted from your Point's, or use mutable variables to reflect your C++ algorithm. First method is more "functional", e.g. you may easily add parralelism into it, since all data is immutable, but it should be little slower, since you'll have to count hash each time you need to get v by point.

Related

Generate a random list of custom data type, where values provided to data constructor are somehow bounded within a range

I have defined a Point data type, with a single value constructor like so:
data Point = Point {
x :: Int,
y :: Int,
color :: Color
} deriving (Show, Eq)
data Color = None
| Black
| Red
| Green
| Blue
deriving (Show, Eq, Enum, Bounded)
I have found an example of making a Bounded Enum an instance of the Random class and have made Color an instance of it like so:
instance Random Color where
random g = case randomR (1, 4) g of
(r, g') -> (toEnum r, g')
randomR (a, b) g = case randomR (fromEnum a, fromEnum b) g of
(r, g') -> (toEnum r, g')
I was then able to find out how to make Point an instance of the Random class also:
instance Random Point where
randomR (Point xl yl cl, Point xr yr cr) g =
let (x, g1) = randomR (xl, xr) g
(y, g2) = randomR (yl, yr) g1
(c, g3) = randomR (cl, cr) g2
in (Point x y c, g3)
random g =
let (x, g1) = random g
(y, g2) = random g1
(c, g3) = random g2
in (Point x y c, g3)
So, this let's me make random point values. But, what I'd like to do is be able to create a list of random Point values, where the x and the y properties are bounded within some range whilst leaving the color property to be an unbounded random value. Is this possible with the way I am currently modelling the code, or do I need to rethink how I construct Point values? For instance, instead of making Point an instance of the Random class, should I just create a random list of Int in the IO monad and then have a pure function that creates n Points, using values from the random list to construct each Point value?
Edit, I think I have found out how to do it:
Without changing the above code, in the IO monad I can do the following:
solved :: IO ()
solved = do
randGen <- getStdGen
let x = 2
let randomPoints = take x $ randomRs (Point 0 0 None, Point 200 200 Blue) randGen
putStrLn $ "Random points: " ++ show randomPoints
This seems to work, randomRs appears to let me specify a range...
Presumably because my Point data type derives Eq?
Or
Is it because my x and y properties are Int (guessing here, but may be "bounded" by default) and I have Color derive bounded?

It works because of the properties of the Int and Color types, not because of the properties of Point. If one suppresses the Eq clause of Point, it still works.
Your code is overall quite good, however I would mention a few minor caveats.
In the Random instance for Point, you are chaining the generator states manually; this is a bit error prone, and monadic do notation is supposed to make it unnecessary. The Color instance could be simplified.
You are using IO where it is not really required. IO is just one instance of the MonadRandom class. If g is an instance of RandomGen, any Rand g is an instance of MonadRandom.
The random values you're getting are not reproducible from a program execution to the next one; this is because getStdGen implicitly uses the launch time as a random number generation seed. It may do that because it is IO-hosted. In many situations, this is a problem, as one might want to vary the choice of random sequence and the system parameters independently of each other.
Using monadic style, the basics of your code could be rewritten for example like this:
import System.Random
import System.Random.TF -- Threefish random number generator
import Control.Monad.Random
data Point = Point {
x :: Int,
y :: Int,
color :: Color
} deriving (Show, Eq)
data Color = None
| Black
| Red
| Green
| Blue
deriving (Show, Eq, Enum, Bounded)
instance Random Color where
randomR (a, b) g = let (r,g') = randomR (fromEnum a, fromEnum b) g
in (toEnum r, g')
random g = randomR (minBound::Color, maxBound::Color) g
singleRandomPoint :: -- monadic action for just one random point
MonadRandom mr => Int -> Int -> Color -> Int -> Int -> Color -> mr Point
singleRandomPoint xmin ymin cmin xmax ymax cmax =
do
-- avoid manual chaining of generator states:
x <- getRandomR (xmin, xmax)
y <- getRandomR (ymin, ymax)
c <- getRandomR (cmin, cmax)
return (Point x y c)
And then we can derive an expression returning an unlimited list of random points:
-- monadic action for an unlimited list of random points:
seqRandomPoints :: MonadRandom mr =>
Int -> Int -> Color -> Int -> Int -> Color -> mr [Point]
seqRandomPoints xmin ymin cmin xmax ymax cmax =
sequence (repeat (singleRandomPoint xmin ymin cmin xmax ymax cmax))
-- returns an unlimited list of random points:
randomPoints :: Int -> Int -> Int -> Color -> Int -> Int -> Color -> [Point]
randomPoints seed xmin ymin cmin xmax ymax cmax =
let
-- get random number generator:
-- using Threefish algorithm (TF) for better statistical properties
randGen = mkTFGen seed
action = seqRandomPoints xmin ymin cmin xmax ymax cmax
in
evalRand action randGen
Finally we can print the first few random points on stdout:
-- Small printing utility:
printListAsLines :: Show t => [t] -> IO()
printListAsLines xs = mapM_ (putStrLn . show) xs
solved01 :: IO ()
solved01 = do
let
seed = 42 -- for random number generator setup
-- unlimited list of random points:
allRandomPoints = randomPoints seed 0 0 None 200 200 Blue
count = 5
someRandomPoints = take count allRandomPoints
-- IO not used at all so far
putStrLn $ "Random points: "
printListAsLines someRandomPoints
main = solved01
Program execution (reproducible with constant seed):
$ randomPoints
Random points:
Point {x = 187, y = 56, color = Green}
Point {x = 131, y = 28, color = Black}
Point {x = 89, y = 135, color = Blue}
Point {x = 183, y = 190, color = Red}
Point {x = 27, y = 161, color = Green}
$
Should you prefer to just get a finite number of points and also get back the updated state of your random number generator, you would have to use replicate n instead of repeat, and runRand instead of evalRand.
Bit more details about the monadic approach here.

Coordinates for clockwise outwards spiral

I'm trying to make what I think is called an Ulam spiral using Haskell.
It needs to go outwards in a clockwise rotation:
6 - 7 - 8 - 9
| |
5 0 - 1 10
| | |
4 - 3 - 2 11
|
..15- 14- 13- 12
For each step I'm trying to create coordinates, the function would be given a number and return spiral coordinates to the length of input number eg:
mkSpiral 9
> [(0,0),(1,0),(1,-1),(0,-1),(-1,-1),(-1,0),(-1,1),(0,1),(1,1)]
(-1, 1) - (0, 1) - (1, 1)
|
(-1, 0) (0, 0) - (1, 0)
| |
(-1,-1) - (0,-1) - (1,-1)
I've seen Looping in a spiral solution, but this goes counter-clockwise and it's inputs need to the size of the matrix.
I also found this code which does what I need but it seems to go counterclock-wise, stepping up rather than stepping right then clockwise :(
type Spiral = Int
type Coordinate = (Int, Int)
-- number of squares on each side of the spiral
sideSquares :: Spiral -> Int
sideSquares sp = (sp * 2) - 1
-- the coordinates for all squares in the given spiral
coordinatesForSpiral :: Spiral -> [Coordinate]
coordinatesForSpiral 1 = [(0, 0)]
coordinatesForSpiral sp = [(0, 0)] ++ right ++ top ++ left ++ bottom
where fixed = sp - 1
sides = sideSquares sp - 1
right = [(x, y) | x <- [fixed], y <- take sides [-1*(fixed-1)..]]
top = [(x, y) | x <- reverse (take sides [-1*fixed..]), y <- [fixed]]
left = [(x, y) | x <- [-1*fixed], y <- reverse(take sides [-1*fixed..])]
bottom = [(x, y) | x <- take sides [-1*fixed+1..], y <- [-1*fixed]]
-- an endless list of coordinates (the complete spiral)
mkSpiral :: Int -> [Coordinate]
mkSpiral x = take x endlessSpiral
endlessSpiral :: [Coordinate]
endlessSpiral = endlessSpiral' 1
endlessSpiral' start = coordinatesForSpiral start ++ endlessSpiral' (start + 1)
After much experimentation I can't seem to change the rotation or starting step direction, could someone point me in the right way or a solution that doesn't use list comprehension as I find them tricky to decode?

Let us first take a look at how the directions of a spiral are looking:
R D L L U U R R R D D D L L L L U U U U ....
We can split this in sequences like:
n times n+1 times
_^_ __^__
/ \ / \
R … R D … D L L … L U U … U
\_ _/ \__ __/
v v
n times n+1 times
We can repeat that, each time incrementing n by two, like:
data Dir = R | D | L | U
spiralSeq :: Int -> [Dir]
spiralSeq n = rn R ++ rn D ++ rn1 L ++ rn1 U
where rn = replicate n
rn1 = replicate (n + 1)
spiral :: [Dir]
spiral = concatMap spiralSeq [1, 3..]
Now we can use Dir here to calculate the next coordinate, like:
move :: (Int, Int) -> Dir -> (Int, Int)
move (x, y) = go
where go R = (x+1, y)
go D = (x, y-1)
go L = (x-1, y)
go U = (x, y+1)
We can use scanl :: (a -> b -> a) -> a -> [b] -> [a] to generate the points, like:
spiralPos :: [(Int, Int)]
spiralPos = scanl move (0,0) spiral
This will yield an infinite list of coordinates for the clockwise spiral. We can use take :: Int -> [a] -> [a] to take the first k items:
Prelude> take 9 spiralPos
[(0,0),(1,0),(1,-1),(0,-1),(-1,-1),(-1,0),(-1,1),(0,1),(1,1)]

The idea with the following solution is that instead of trying to generate the coordinates directly, we’ll look at the directions from one point to the next. If you do that, you’ll notice that starting from the first point, we go 1× right, 1× down, 2× left, 2× up, 3× right, 3× down, 4× left… These can then be seperated into the direction and the number of times repeated:
direction: > v < ^ > v < …
# reps: 1 1 2 2 3 3 4 …
And this actually gives us two really straightforward patterns! The directions just rotate > to v to < to ^ to >, while the # of reps goes up by 1 every 2 times. Once we’ve made two infinite lists with these patterns, they can be combined together to get an overall list of directions >v<<^^>>>vvv<<<<…, which can then be iterated over to get the coordinate values.
Now, I’ve always thought that just giving someone a bunch of code as the solution is not the best way to learn, so I would highly encourage you to try implementing the above idea yourself before looking at my solution below.
Welcome back (if you did try to implement it yourself). Now: onto my own solution. First I define a Stream data type for an infinite stream:
data Stream a = Stream a (Stream a) deriving (Show)
Strictly speaking, I don’t need streams for this; Haskell’s predefined lists are perfectly adequate for this task. But I happen to like streams, and they make some of the pattern matching a bit easier (because I don’t have to deal with the empty list).
Next, I define a type for directions, as well as a function specifying how they interact with points:
-- Note: I can’t use plain Left and Right
-- since they conflict with constructors
-- of the ‘Either’ data type
data Dir = LeftDir | RightDir | Up | Down deriving (Show)
type Point = (Int, Int)
move :: Dir -> Point -> Point
move LeftDir (x,y) = (x-1,y)
move RightDir (x,y) = (x+1, y)
move Up (x,y) = (x,y+1)
move Down (x,y) = (x,y-1)
Now I go on to the problem itself. I’ll define two streams — one for the directions, and one for the number of repetitions of each direction:
dirStream :: Stream Dir
dirStream = Stream RightDir $ Stream Down $ Stream LeftDir $ Stream Up dirVals
numRepsStream :: Stream Int
numRepsStream = go 1
where
go n = Stream n $ Stream n $ go (n+1)
At this point we’ll need a function for replicating each element of a stream a specific number of times:
replicateS :: Stream Int -> Stream a -> Stream a
replicateS (Stream n ns) (Stream a as) = conss (replicate n a) $ replicateS ns as
where
-- add more than one element to the beginning of a stream
conss :: [a] -> Stream a -> Stream a
conss [] s = s
conss (x:xs) s = Stream x $ appends xs s
This gives replicateS dirStream numRepsStream for the stream of directions. Now we just need a function to convert those directions to coordinates, and we’ve solved the problem:
integrate :: Stream Dir -> Stream Point
integrate = go (0,0)
where
go p (Stream d ds) = Stream p (go (move d p) ds)
spiral :: Stream Point
spiral = integrate $ replicateS numRepsStream dirStream
Unfortunately, it’s somewhat inconvenient to print an infinite stream, so the following function is useful for debugging and printing purposes:
takeS :: Int -> Stream a -> [a]
takeS 0 _ = []; takeS n (Stream x xs) = x : (takeS (n-1) xs)

Debug memory issue in Haskell

I'm trying to solve the whole Advent of Code series in Haskell.
I'm encountering a memory issue while solving the 2015/06 exercise where there is a bunch of instructions to turn on, off and toggle lights on a grid. The goal is to count the number of lit lights at the end.
Given instructions are parsed and stored in a Instruction type, this is the type definition:
data Instruction = Instruction Op Range deriving Show
data Op = Off | On | Toggle | Nop deriving Show
data Range = Range Start End deriving Show
type Start = Point
type End = Start
data Point = Point Int Int deriving Show
This is the code that calculates the result. I'm trying to abstract over the fact that a light is a Boolean by using a typeclass
gridWidth, gridHeight :: Int
gridWidth = 1000
gridHeight = 1000
initialGrid :: Togglable a => Matrix a
initialGrid = matrix gridWidth gridHeight (const initialState)
instance Monoid Op where
mempty = Nop
instance Semigroup Op where
_ <> On = On
_ <> Off = Off
x <> Nop = x
Off <> Toggle = On
On <> Toggle = Off
Toggle <> Toggle = Nop
Nop <> Toggle = Toggle
class Togglable a where
initialState :: a
apply :: Op -> a -> a
instance Togglable Bool where
initialState = False
apply On = const True
apply Off = const False
apply Toggle = not
apply Nop = id
-- Does the Range of the instruction apply to this matrix coordinate?
(<?) :: Range -> (Int, Int) -> Bool
(<?) (Range start end) (x, y) = let
(Point x1 y1) = start
(Point x2 y2) = end
(mx, my) = (x-1, y-1) -- translate from matrix coords (they start from 1!)
in and [
mx >= min x1 x2, mx <= max x1 x2,
my >= min y1 y2, my <= max y1 y2
]
stepGenerator :: Instruction -> Matrix Op
stepGenerator (Instruction op r) = let
g coord = if r <? coord then op else Nop
in matrix gridWidth gridHeight g
allStepsMatrix :: [Instruction] -> Matrix Op
allStepsMatrix = mconcat.map stepGenerator
finalGrid :: Togglable a => Matrix a -> Matrix Op -> Matrix a
finalGrid z op = fmap apply op <*> z
countOn :: Matrix Bool -> Integer
countOn = toInteger.foldr (\x -> if x then (+1) else id) 0
partA :: Challenge (String -> Integer)
partA = Challenge $ countOn.finalGrid initialGrid.allStepsMatrix.parse
The solution will be the Integer returned by what's inside partA. parse works and has type parse :: String -> [Instruction]
The code compiles and runs with small matrices (e.g. 10x10), as soon as I turn gridWidth and gridHeight to 1000 I'm faced with a out of memory error, apparently generating from the allStepsMatrix function.
Any hint of what could be wrong here? Full code is on GitHub

I would strongly suggest not using a typeclass. Typeclasses are supposed to have laws, and they should be "rare", in the sense that each type has only a few valid implementations. I would suggest taking initialState and toggle as arguments, but even that is overkill, because the given instructions simply do not make sense with any type that isn't Bool. Just operate on a Matrix Bool directly and you can cut out a good chunk of the code you've written. However, I won't change anything for my answer.
In any case, I think the issue may be laziness. 1000 * 1000 = 1000000, so each Matrix will be several megabytes in size. On a 64-bit machine, a pointer is 8 bytes, so each Matrix is at least 8 MB, plus a few more for the data behind it. You are mconcating 300 of them (that's what I get from the site) together, but, because you are doing it lazily, all 300 matrices are resident simultaneously, so it's at least 2.4 GB, just for the structures themselves. The cost of filling each of those 300 million pointers with thunks also makes itself known—a thunk is at least one pointer (8 bytes, pointing to code in static memory, making another 2.4 GB), plus its payload, which, here, means more pointers, each one bestowing your computer with another 2.4 GB of memory pressure. I suggest deepseq:
instance NFData Op where
rnf Off = ()
rnf On = ()
rnf Toggle = ()
rnf Nop = ()
-- rnf x = x `seq` () but I like to be explicit
allStepsMatrix :: [Instruction] -> Matrix Op
allStepsMatrix = foldl' (\x y -> force (x <> y)) mempty . map stepGenerator
Usnig foldl' lets this operate in constant stack space, but foldl or foldr would also work, because a stack depth on the order of 300 is nothing. The force means that all the elements of each Matrix are evaluated. Before, each matrix kept the previous one alive by holding references to it, but now the references are removed when the elements are evaluated, so the GC can throw them out in a timely manner. I have tested this and it terminates in reasonable time with much better space usage.

Bad performance for parallel N-Body in Haskell

I wrote an N-body implementation in Haskell and tried to parallelize the updating of position of the particles (outer loop) using parList. However, the performance improvement from 1 to 3 cores is small, and it actually becomes worse for 4 cores. Here is the code, using 5 particles:
import Numeric.LinearAlgebra
import Data.List (zipWith4)
import Data.List.Split
import Control.DeepSeq
import System.Environment
import Control.Parallel.Strategies
type Vec = Vector Double
data Particle = Particle
{ tag :: Int
, position :: !Vec
, velocity :: !Vec
, mass :: Double
}
instance Eq Particle where
p1 == p2 = (tag p1) == (tag p2)
instance Show Particle where
show p = show $ position p
instance NFData Particle where
rnf p = t `seq` ps `seq` vel `seq` m `seq` ()
where
t = tag p
ps = position p
vel = velocity p
m = mass p
data System = System
{ number :: Int
, particles :: [Particle]
}
instance Show System where
show sys = show (particles sys)
instance NFData System where
rnf sys = rnf $ particles sys
g :: Double
g = 6.674e-11
-- | Construct a vector from a list of elements
mkVector :: [Double] -> Vec
mkVector = vector
-- | Construct a system from a given list of
-- position vectors, velocity vectors, and masses
mkSystem :: [(Vec, Vec, Double)] -> System
mkSystem xs = System
{ number = length xs
, particles = map (apply Particle) $ zipWith cons [0..] xs
}
where
cons y (a, b, c) = (y, a, b, c)
apply f (y, a, b, c) = f y a b c
-- | Calculate the gravitational force vector acted on p1 by p2
getForce :: Particle -> Particle -> Vec
getForce p1 p2
| p1 == p2 = scalar 0.0
| otherwise = scalar (g * (m1 * m2 / r^2)) * unit
where
m1 = mass p1
m2 = mass p2
r1 = position p1
r2 = position p2
dr = r2 - r1
r = norm_2 dr
unit = normalize dr
-- | Calculate the net gravitational force acted on the particle by
-- all the other particles
netForce :: Particle -> [Particle] -> Vec
netForce p ps = sum (map (getForce p) ps)
-- | Update the position and velocity of a particle in the system
-- through a given timestep
move :: Double -> System -> Particle -> Particle
move h sys p = let
(pos, vel) = (position p, velocity p)
acc = (netForce p $ particles sys) / (scalar $ mass p)
in
p { position = pos + scalar h * vel
, velocity = vel + scalar h * acc
}
stepSize = 1000
-- | Update the system through one timestep
evolve :: System -> System
evolve sys = sys { particles = ps' }
where
ps' = map (move stepSize sys) (particles sys) `using` parList rdeepseq
-- | Stream of system at different timesteps
evolution :: System -> [System]
evolution = iterate evolve
sys1 :: System
sys1 = mkSystem
[ (mkVector [1.496e11, 0], mkVector [0, 2.98e4], 5.974e24)
, (mkVector [2.279e11, 0], mkVector [0, 2.41e4], 6.42e23)
, (mkVector [5.79e10, 0], mkVector [0, 4.8e4], 3.3e23)
, (mkVector [0, 0], mkVector [0, 0], 1.98e30)
, (mkVector [ 1.08e11, 0], mkVector [0, 3.5e4], 4.87e24)
]
main :: IO ()
main = do
[n] <- fmap (map read) getArgs :: IO [Int]
let states = evolution sys1
finalState = states !! n
finalState `deepseq` (return ())
What is limiting the parallelism?

Two thoughts:
I don't recommend HMatrix. It's a “Matlab in a box” library, which not only goes strongly against Haskell's grain in terms of semantics (dynamical dimension checking† etc.) but also subverts Haskell's better sides in performance. The idea of HMatrix/Matlab is to work with large, monolithic, heap-allocated arrays and do as much as possible with linear algebra primitives (which are implemented in C). This approach makes sense in a slow language like Matlab or Python (you delegate the inner loops to another, faster language), but not really in Haskell: the external calls can't properly be inlined, unlike native Haskell functions, and they add some overhead which though it may be negligible when crunching large vectors, can definitely add up for small ones. Pointer indirection harms cache performance.
The most popular static, native LA library is linear, which takes a much more Haskell-oriented approach (functors everywhere), and has proper tight memory layout. (I'm actually not such a fan of the interface itself, but the data types can also be used with the very nice vector-space operators.)
Switching to the types from linear improves the performance (single-threaded) by a factor 10 for me.
import Data.VectorSpace
import Data.VectorSpace.Free
import Linear.V2
import Data.List (zipWith4)
import Data.List.Split
import Control.DeepSeq
import System.Environment
import Control.Parallel.Strategies
type Vec = V2 Double -- In case you're wondering: you don't need to
-- hard-code the dimension, you can also make your
-- functions _polymorphic_ on the vector space.
data Particle = Particle
{ tag :: !Int
, position :: !Vec
, velocity :: !Vec
, mass :: !Double
}
instance Eq Particle where
p1 == p2 = (tag p1) == (tag p2)
instance Show Particle where
show p = show $ position p
instance NFData Particle where
rnf p = t `seq` ps `seq` vel `seq` m `seq` ()
where
t = tag p
ps = position p
vel = velocity p
m = mass p
data System = System
{ number :: Int
, particles :: [Particle]
}
instance Show System where
show sys = show (particles sys)
instance NFData System where
rnf sys = rnf $ particles sys
g :: Double
g = 6.674e-11
-- | Construct a vector from a list of elements
mkVector :: [Double] -> Vec
mkVector [x,y] = V2 x y
-- | Construct a system from a given list of
-- position vectors, velocity vectors, and masses
mkSystem :: [(Vec, Vec, Double)] -> System
mkSystem xs = System
{ number = length xs
, particles = map (apply Particle) $ zipWith cons [0..] xs
}
where
cons y (a, b, c) = (y, a, b, c)
apply f (y, a, b, c) = f y a b c
-- | Calculate the gravitational force vector acted on p1 by p2
getForce :: Particle -> Particle -> Vec
getForce p1 p2
| p1 == p2 = zeroV
| otherwise = (g * (m1 * m2 / r^2)) *^ unit
where
m1 = mass p1
m2 = mass p2
r1 = position p1
r2 = position p2
dr = r2 - r1
r = magnitude dr
unit = dr ^/ r
-- | Calculate the net gravitational force acted on the particle by
-- all the other particles
netForce :: Particle -> [Particle] -> Vec
netForce p ps = sum (map (getForce p) ps)
-- | Update the position and velocity of a particle in the system
-- through a given timestep
move :: Double -> System -> Particle -> Particle
move h sys p = let
(pos, vel) = (position p, velocity p)
acc = (netForce p $ particles sys) ^/ mass p
in
p { position = pos ^+^ h *^ vel
, velocity = vel ^+^ h *^ acc
}
stepSize = 1000
-- | Update the system through one timestep
evolve :: System -> System
evolve sys = sys { particles = ps' }
where
ps' = map (move stepSize sys) (particles sys) `using` parList rdeepseq
-- | Stream of system at different timesteps
evolution :: System -> [System]
evolution = iterate evolve
sys1 :: System
sys1 = mkSystem
[ (mkVector [1.496e11, 0], mkVector [0, 2.98e4], 5.974e24)
, (mkVector [2.279e11, 0], mkVector [0, 2.41e4], 6.42e23)
, (mkVector [5.79e10, 0], mkVector [0, 4.8e4], 3.3e23)
, (mkVector [0, 0], mkVector [0, 0], 1.98e30)
, (mkVector [ 1.08e11, 0], mkVector [0, 3.5e4], 4.87e24)
]
main :: IO ()
main = do
[n] <- fmap (map read) getArgs :: IO [Int]
let states = evolution sys1
finalState = states !! n
finalState `deepseq` (return ())
I wouldn't expect too gain much from parallelisation for only five particles anyhow. Each thread has very little it can do before giving back control, and everything interacts. You therefore quickly notice the overhead even of Haskell's light green threads.
To benefit properly from parallelisation, you'll need to either set up the threads as a thread pool (like you would in traditional number-crunching languages) and thus avoid the sparking overhead, or make sure each thread has a lot to do. This works best if you mostly decouple the threads' domains; in really-many body applications this is done with something like a Barnes-Hut simulation.
Another thing: you're right now using a simple Euler solver for solving the equations of motion. That's a very bad scheme. Switching to a higher-order Runge Kutta solver, ideally with adaptive step size, would allow you to save so much in terms of number of steps that the parallelisation aspect largely pales in impact by comparison.
†Actually, HMatrix also has had a statically checked module for quite some time now.

How do I stop randomness from pervading my code in Haskell?

I am attempting to implement the following algorithm, as detailed here.
Start with a flat terrain (initialize all height values to zero).
Pick a random point on or near the terrain, and a random radius
between some predetermined minimum and maximum. Carefully choosing
this min and max will make a terrain rough and rocky or smooth and
rolling.
Raise a hill on the terrain centered at the point, having the given
radius.
Go back to step 2, and repeat as many times as necessary. The number
of iterations chosen will affect the appearance of the terrain.
However, I start to struggle once I get to the point where I have to select a random point on the terrain. This random point is wrapped in an IO monad, which is then passed up my chain of functions.
Can I cut the IO off at a certain point and, if so, how do I find that point?
The following is my (broken) code. I would appreciate any suggestions on improving it / stopping the randomness from infecting everything.
type Point = (GLfloat, GLfloat, GLfloat)
type Terrain = [Point]
flatTerrain :: Double -> Double -> Double -> Double -> Terrain
flatTerrain width length height spacing =
[(realToFrac x, realToFrac y, realToFrac z)
| x <- [-width,-1+spacing..width], y <- [height], z <- [-length,-1+spacing..length]]
hill :: Terrain -> Terrain
hill terrain = hill' terrain 100
where hill' terrain 0 = terrain
hill' terrain iterations = do
raised <- raise terrain
hill' (raise terrain) (iterations - 1)
raise terrain = do
point <- pick terrain
map (raisePoint 0.1 point) terrain
raisePoint r (cx,cy,cz) (px,py,pz) =
(px, r^2 - ((cx - px)^2 + (cz - pz)^2), pz)
pick :: [a] -> IO a
pick xs = randomRIO (0, (length xs - 1)) >>= return . (xs !!)

The algorithm says that you need to iterate and in each iteration select a random number and update the terrain which can be viewed as generate a list of random points and use this list to update the terrain i.e iteration to generate random numbers == list of random numbers.
So you can do something like:
selectRandomPoints :: [Points] -> Int -> IO [Points] -- generate Int times random points
updateTerrain :: Terrain -> [Points] -> Terrain
-- somewhere in IO
do
pts <- selectRandomPoints allPts iterationCount
let newTerrain = updateTerrain t pts

One of the most useful features of haskell is to know a function is deterministic just based on its type - it makes testing much easier. For this reason, I would base my design on limiting randomness as much as possible, and wrapping the core non random functions with a random variant. This is easily done with the MonadRandom type class, which is the best way of writing code in haskell that requires random values.
For fun, I wrote a console version of that hill generator. It is pretty basic, with a lot of hard coded constants. However, it does provide a pretty cool ascii terrain generator :)
Note with my solution all of the calculations are isolated in pure, non random functions. This could then be tested easily, as the result is deterministic. As little as possible occurs in the IO monad.
import Control.Monad
import Control.Monad.Random
import Data.List
import Data.Function (on)
type Point = (Double, Double, Double)
type Terrain = [Point]
-- Non random code
flatTerrain :: Double -> Double -> Double -> Double -> Terrain
flatTerrain width length height spacing = [(realToFrac x, realToFrac y, realToFrac z)
| x <- [-width,-width+spacing..width], y <- [height], z <- [-length,-length+spacing..length]]
-- simple terrain displayer, uses ascii to render the area.
-- assumes the terrain points are all separated by the same amount
showTerrain :: Terrain -> String
showTerrain terrain = unlines $ map (concat . map showPoint) pointsByZ where
pointsByZ = groupBy ((==) `on` getZ) $ sortBy (compare `on` getZ) terrain
getZ (_, _, z) = z
getY (_, y, _) = y
largest = getY $ maximumBy (compare `on` getY) terrain
smallest = getY $ minimumBy (compare `on` getY) terrain
atPC percent = (largest - smallest) * percent + smallest
showPoint (_, y, _)
| y < atPC (1/5) = " "
| y < atPC (2/5) = "."
| y < atPC (3/5) = "*"
| y < atPC (4/5) = "^"
| otherwise = "#"
addHill :: Double -- Radius of hill
-> Point -- Position of hill
-> Terrain -> Terrain
addHill radius point = map (raisePoint radius point) where
raisePoint :: Double -> Point -> Point -> Point
-- I had to add max py here, otherwise new hills destroyed the
-- old hills with negative values.
raisePoint r (cx,cy,cz) (px,py,pz) = (px, max py (r^2 - ((cx - px)^2 + (cz - pz)^2)), pz)
-- Some random variants. IO is an instance of MonadRandom, so these function can be run in IO. They
-- can also be run in any other monad that has a MonadRandom instance, so they are pretty flexible.
-- creates a random point. Note that the ranges are hardcoded - an improvement would
-- be to be able to specify them, either through parameters, or through reading from a Reader
-- monad or similar
randomPoint :: (MonadRandom m) => m Point
randomPoint = do
x <- getRandomR (-30, 30)
y <- getRandomR (0,10)
z <- getRandomR (-30, 30)
return (x, y, z)
addRandomHill :: (MonadRandom m) => Terrain -> m Terrain
addRandomHill terrain = do
radius <- getRandomR (0, 8) -- hardcoded again
position <- randomPoint
return $ addHill radius position terrain
-- Add many random hills to the Terrain
addRandomHills :: (MonadRandom m) => Int -> Terrain -> m Terrain
addRandomHills count = foldr (>=>) return $ replicate count addRandomHill
-- testing code
test hillCount = do
let terrain = flatTerrain 30 30 0 2
withHills <- addRandomHills hillCount terrain
-- let oneHill = addHill 8 (0, 3, 0) terrain
-- putStrLn $ showTerrain oneHill
putStrLn $ showTerrain withHills
main = test 200
Example output:
... .. ..*. .***^^^***.
... ... .***. .***^^^*^^*.
... .. .*^**......*^*^^^^.
. .***.***. ..*^^^*.
....*^^***^*. .^##^*.
..*.*^^^*****. .^###^..*
.**^^^^.***... .*^#^*.**
.***^##^**..*^^*.*****..**
....***^^##^*.*^##^****. ..
.......*^###^.*###^****.
.*********^###^**^##^***....
*^^^*^##^^^^###^.^^^*. .****..
*^^^^####*^####^..**. .******.
*^^^*####**^###*. .. .*******
*^#^^^##^***^^*. ...........***
*^^^**^^*..*... ..*******...***
.***..*^^*... ..*^^#^^^*......
...*^##^**. .*^^#####*.
.*^##^**....**^^####*. .***
.. ..*^^^*...*...**^^###^* *^#^
..****^^*. .... ...**###^*.^###
..*******.**. ..**^^^#^^..^###
.*****..*^^* ..**^##^**...*^##
.^^^^....*^^*..*^^^##^* ..**^^^
*###^*. .*^**..^###^^^*...*****
^####*.*..*^^*.^###^**.....*..
*###^**^**^^^*.*###^. .. .
.^^^***^^^^#^*.**^^**.
.....***^##^**^^^*^^*.
.*^^##^*^##^^^^^.
.*^^^^*.^##^*^^*.

Nope, you can't escape IO. Perhaps you can do all your randomness up front and rewrite your functions to take that randomness as a parameter; if not, you can use MonadRandom or similar to track a random seed or just put everything in IO.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Mutating data with immutable data structures - haskell

Related

Generate a random list of custom data type, where values provided to data constructor are somehow bounded within a range

Coordinates for clockwise outwards spiral

Debug memory issue in Haskell

Bad performance for parallel N-Body in Haskell

How do I stop randomness from pervading my code in Haskell?

Categories

Resources