Why does folding Events and Behaviors use so much memory? - haskell

I am currently exploring the possibility to use basic containers to give FRP networks more structure and by that to create more sophisticated event networks easier.
Note: I use ordrea but had the same problem with reactive-banana too, so I guess this problem is not specific to the chosen frp implementation.
In this special case I am using a simple Matrix to store Events:
newtype Matrix (w :: Nat) (h :: Nat) v a where
Matrix :: Vector a -> Matrix w h v a
-- deriving instances: Functor, Foldable, Traversable, Applicative
Matrix is basically just a thin wrapper around Data.Vector and most functions I'll use are basically the same as the corresponding Vector ones. The notable exception is indexing, but that should be self explanatory.
With this I can define matrices of events like Matrix 10 10 (Event Double) and are able to define basic convolution algorithms on that:
applyStencil :: (KnownNat w, KnownNat h, KnownNat w', KnownNat h')
=> M.Matrix w' h' (a -> c)
-> M.Matrix w h (Event a)
-> M.Matrix w h (Event c)
applyStencil s m = M.generate stencil
where stencil x y = fold $ M.imap (sub x y) s
sub x0 y0 x y g = g <$> M.clampedIndex m (x0 - halfW + x) (y0 - halfH + y)
halfW = M.width s `div` 2
halfH = M.height s `div` 2
Notes:
M.generate :: (Int -> Int -> a) -> M.Matrix w h a and
M.imap :: (Int -> Int -> a -> b) -> M.Matrix w h a -> M.Matrix w h b
are just wrappers around Vector.generate and Vector.imap respectively.
M.clampedIndex clamps indices into the bounds of the matrix.
Event is an instance of Monoid which is why it is possible to just fold the Matrix w' h' (Event c) returned by M.imap (sub x y) s.
I have a setup approximately like this:
let network = do
-- inputs triggered from external events
let inputs :: M.Matrix 128 128 (Event Double)
-- stencil used:
let stencil :: M.Matrix 3 3 (Double -> Double)
stencil = fmap ((*) . (/16)) $ M.fromList [1,2,1,2,4,2,1,2,1]
-- convolute matrix by applying stencil
let convoluted = applyStencil stencil inputs
-- collect events in order to display them later
-- type: M.Matrix 128 128 (Behavior [Double])
let behaviors = fmap eventToBehavior convoluted
-- now there is a neat trick you can play because Matrix
-- is Traversable and Behaviors are Applicative:
-- type: Behavior (Matrix 128 128 [Double])
return $ Data.Traversable.sequenceA behaviors
Using something like this I am triggering ~15kEvents/s with no problems and lots of headroom in that regard.
Problem is that as soon as I sample the network I can only get about two samples per second from it:
main :: IO ()
main = do
-- initialize the network
sample <- start network
forever $ do
-- not all of the 128*128 inputs are triggered each "frame"
triggerInputs
-- sample the network
mat <- sample
-- display the matrix somehow (actually with gloss)
displayMatrix mat
So far I have made the following observations:
Profiling tells me that productivity is very low (4%-8%)
Most of the time is spend by the garbage collector in Gen 1 (~95%)
Data.Matrix.foldMap (ie fold) is allocating the most memory (~45%, as per -p)
When I was still working with reactive-banana Heinrich Apfelmus recommended that tree based traversals are a better fit for behaviors¹. I tried that for sequenceA, fold and traverse with no success.
I suspected that the newtype wrapper was preventing vectors fusion rules to fire². This is most likely not the culprit.
At this point I have spent the better part of the week searching for a solution to this problem. Intuitively I'd say that sampling should be much faster and and foldMap should not create so much garbage memory. Any ideas?

Related

Alpha Beta Pruning with Recursion Schemes

I'm trying to get more proficient with recursion schemes as they have so far been really helpful for turning gnarly explicit recursion code into something less spike-y. One of the other tools I tend to reach for when implementing algorithms that can get really confusing with explicit recursion is monad transformers / mutability. Ideally I'd like to get comfortable enough with recursion schemes such that I can ditch statefulness altogether. An example of an algorithm I'd still reach for the transformers for is minimax with alpha beta pruning. I did normal minimax with a catamorphism and minimax f-algebra (data MinimaxF a f = MMResult a | MMState [f] Bool), but I wasn't sure how I could extend this to do alpha beta pruning. I thought maybe I could use histomorphism, or maybe there was some custom solution with comonads, but I didn't know how to approach trying a solution using either technique.
In addition to a version of alpha beta pruning with recursion schemes any general advice you have about tackling similar problems would be much appreciated. For example I've had trouble applying recursion schemes to algorithms like Dijkstra that usually are implemented in an imperative fashion.
Alpha-beta can be seen as an instance of minimax, where min and max are instantiated using a well-chosen lattice. Full gist.
We represent games as a tree, where each internal node is a position in the game, waiting for a designated player to pick a move to a child node, and each leaf is a final position with its score, or value.
-- | At every step, either the game ended with a value/score,
-- or one of the players is to play.
data GameF a r = Value a | Play Player (NonEmpty r)
deriving Functor
type Game a = Fix (GameF a)
-- | One player wants to maximize the score,
-- the other wants to minimize the score.
data Player = Mini | Maxi
minimax will work on any lattice, defined by the following class:
class Lattice l where
inf, sup :: l -> l -> l
The Lattice class is more general than Ord: and Ord instance is a Lattice with decidable equality (Eq). If we could redefine Ord, then it would be appropriate to add Lattice as a superclass. But here a newtype will have to do:
-- The Lattice induced by an Ord
newtype Order a = Order { unOrder :: a }
deriving (Eq, Ord)
instance Ord a => Lattice (Order a) where
inf = min
sup = max
Here's minimax. It is parameterized by an embedding leaf :: a -> l of final values to the chosen lattice. One player maximizes the embedded value, the other player minimizes it.
-- | Generalized minimax
gminimax :: Lattice l => (a -> l) -> Game a -> l
gminimax leaf = cata minimaxF where
minimaxF (Value x) = leaf x
minimaxF (Play p xs) = foldr1 (lopti p) xs
lopti :: Lattice l => Player -> l -> l -> l
lopti Mini = inf
lopti Maxi = sup
The "regular" minimax uses the scores of the game directly as the lattice:
minimax :: Ord a => Game a -> a
minimax = unOrder . gminimax Order
For alpha-beta pruning, the idea is that we can keep track of some bounds on the optimal score, and this allows us to short-circuit the search. So the search is to be parameterized by that interval (alpha, beta). This leads us to a lattice of functions Interval a -> a:
newtype Pruning a = Pruning { unPruning :: Interval a -> a }
An interval can be represented by (Maybe a, Maybe a) to allow either side to be unbounded. But we shall use better named types for clarity, and also to leverage a different Ord instance on each side:
type Interval a = (WithBot a, WithTop a)
data WithBot a = Bot | NoBot a deriving (Eq, Ord)
data WithTop a = NoTop a | Top deriving (Eq, Ord)
We will require that we can only construct Pruning f if f satisfies clamp i (f i) = clamp i (f (Bot, Top)), where clamp is defined below. That way, f is a search algorithm which may shortcircuit if it learns that its result lies outside of the interval, without having to find the exact result.
clamp :: Ord a => Interval a -> a -> a
clamp (l, r) = clampBot l . clampTop r
clampBot :: Ord a => WithBot a -> a -> a
clampBot Bot x = x
clampBot (NoBot y) x = max y x
clampTop :: Ord a => WithTop a -> a -> a
clampTop Top x = x
clampTop (NoTop y) x = min y x
Functions form a lattice by pointwise lifting. And when we consider only functions satisfying clamp i (f i) = clamp i (f (Bot, Top)) and equate them modulo a suitable equivalence relation (Pruning f = Pruning g if clamp <*> f = clamp <*> g), a short-circuiting definition of the lattice becomes possible.
The inf of two functions l and r, given an interval i = (alpha, beta), first runs l (alpha, beta) to obtain a value vl.
If vl <= alpha, then it must be clamp i vl == alpha == clamp i (min vl (r i)) so we can stop and return vl without looking at r. Otherwise, we run r, knowing that the final result is not going to be more than vl so we can also update the upper bound passed to r. sup is defined symmetrically.
instance Ord a => Lattice (Pruning a) where
inf l r = Pruning \(alpha, beta) ->
let vl = unPruning l (alpha, beta) in
if NoBot vl <= alpha then vl else min vl (unPruning r (alpha, min (NoTop vl) beta))
sup l r = Pruning \(alpha, beta) ->
let vl = unPruning l (alpha, beta) in
if beta <= NoTop vl then vl else max vl (unPruning r (max (NoBot vl) alpha, beta))
Thus we obtain alpha-beta as an instance of minimax. Once the lattice above is defined, we only need some simple wrapping and unwrapping.
alphabeta :: Ord a => Game a -> a
alphabeta = runPruning . gminimax constPruning
constPruning :: a -> Pruning a
constPruning = Pruning . const
runPruning :: Pruning a -> a
runPruning f = unPruning f (Bot, Top)
If all goes well, alphabeta and minimax should have the same result:
main :: IO ()
main = quickCheck \g -> minimax g === alphabeta (g :: Game Int)

Avoiding thunks in sparsely evaluated list generated by monadic unfold

I have a simulation library that uses the FFI wrapped in a monad M, carrying a context. All the foreign functions are pure, so I've decided to make the monad lazy, which is normally convenient for flow-control. I represent my simulation as a list of simulation-frames, that I can consume by either writing to a file, or by displaying the frame graphically.
simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame]
simulation [] frame = return [frame]
simulation (step:steps) frame
= step frame >>= fmap (frame:) . simulation steps
Each frame consists of a tuple of newtype-wrapped ForeignPtrs that I can lift to my Haskell representation with
lift :: Frame -> M HFrame
Since the time-steps in my simulation are quite short, I only want to look at every n frames, for which I use
takeEvery n l = foldr cons nil l 0 where
nil _ = []
cons x rest 0 = x : rest n
cons x rest n = rest (n-1)
So my code looks something like
main = consume
$ takeEvery n
$ runM
$ simulation steps initialFrame >>= mapM lift
Now, the problem is that as I increase n, a thunk builds up. I've tried a couple of different ways to try to strictly evaluate each frame in simulation, but I have yet to figure out how to do so. ForeignPtr doesn't appear to have a NFData instance, so I can't use deepseq, but all my attempts with seq, including using seq on each element in the tuple, have been without noticeable effect.
EDIT:
Upon request, I have included more specifics, that I initially excluded since I think they are probably mostly noise for this question.
The monad
newtype FT c a = FT (Context -> a)
instance Functor (FT c) where
fmap f (FT a) = FT (f.a)
instance Applicative (FT c) where
pure a = FT (\_ -> a)
(<*>) (FT a) (FT b) = FT (\c -> a c $ b c)
instance Monad (FT c) where
return = pure
(>>=) (FT a) f = FT (\c -> (\(FT b) -> b c) $ f $ a c)
runFTIn :: Context -> (forall c. FT c a) -> a
runFTIn context (FT a) = a context
runFTWith :: [ContextOption] -> (forall c. FT c a) -> a
runFTWith options a
= unsafePerformIO
$ getContext options >>= \c -> return $ runFTIn c a
runFT = runFTWith []
unsafeLiftFromIO :: (Context -> IO a) -> FT c a
unsafeLiftFromIO a = FT (\c -> unsafePerformIO $ a c)
All the foreign functions are lifted from IO with unsafeLiftFromIO
newtype Box c = Box (ForeignPtr RawBox)
newtype Coordinates c = Coordinates (ForeignPtr RawCoordinates)
type Frame c = (Box c, Coordinates c)
liftBox :: Box c -> FT c HBox
liftCoordinates :: Coordinates c -> FT c HCoordinates
liftFrame (box, coordinates) = do
box' <- liftBox box
coordinates' <- liftCoordinates coordinates
return (box', coordinates')
The steps themselves are supposed to be arbitrary (Frame c -> FT c (Frame c)), so strictness should preferably be in the higher level code.
EDIT2:
I have now tried to use Streamly, however the problem persists, so I think the issue really is finding a way to strictly evaluate ForeignPtrs.
current implementations:
import Streamly
import qualified Streamly.Prelude as S
import qualified Streamly.Internal.Data.Stream.Serial as Serial
takeEvery n = Serial.unfoldrM ((fmap.fmap) (\(h, t) -> (h, S.drop (n-1) t)) . S.uncons)
(#) = flip ($)
simulation
:: (IsStream t)
=> Frame c
-> t (FT c) (Frame c -> FT c (Frame c))
-> t (FT c) (Frame c)
simulation frame = S.scanlM' (#) frame
EDIT3:
To clarify the symptoms and how I have diagnosed the problem.
The library calls OpenCL functions running on a GPU. I am sure that the freeing of the pointers is handled correctly - the ForeignPtrs have the correct freeing functions, and memory use is independent of total number of steps as long as this number is larger than n. What I find is that memory use on the GPU is basically linearly correlated to n. The consumer I've been using for this testing is
import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Put
writeTrajectory fn = fmap (BL.writeFile fn . runPut) . S.foldr ((>>).putFrame) (pure ()) . serially
For my streamly implementation, and
writeTrajectory fn = BL.writeFile fn . runPut . MapM_ putFrame
For the original implementation. Both should consume the stream continuously. I've generated the steps for testing with replicate.
I am unsure of how to more precisely analyze the memory-use on the GPU. System memory use is not an issue here.
Update:
I am starting to think it's not a matter of strictness, but of GC-problems. The run-time system does not know the size of the memory allocated on the GPU and so does not know to collect the pointers, this is less of an issue when there is stuff going on CPU-side as well, as that will produce allocations too, activating the GC. This would explain the slightly non-determinstic memory usage, but linear correlation to n that I've seen. How too solve this nicely is another issue, but I suspect there will be a substantial overhaul to my code.
I think the issue really is finding a way to strictly evaluate ForeignPtrs
If that is really the issue, one way to do that is to change the second clause of simulation:
{-# LANGUAGE BangPatterns #-}
simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame]
simulation [] frame = return [frame]
simulation (step:steps) frame#(!_, !_) -- Evaluate both components of the pair
= step frame >>= fmap (frame:) . simulation steps

Using show on a custom type

I'm having trouble printing contents of a custom matrix type I made. When I try to do it tells me
Ambiguous occurrence `show'
It could refer to either `MatrixShow.show',
defined at Matrices.hs:6:9
or `Prelude.show',
imported from `Prelude' at Matrices.hs:1:8-17
Here is the module I'm importing:
module Matrix (Matrix(..), fillWith, fromRule, numRows, numColumns, at, mtranspose, mmap) where
newtype Matrix a = Mat ((Int,Int), (Int,Int) -> a)
fillWith :: (Int,Int) -> a -> (Matrix a)
fillWith (n,m) k = Mat ((n,m), (\(_,_) -> k))
fromRule :: (Int,Int) -> ((Int,Int) -> a) -> (Matrix a)
fromRule (n,m) f = Mat ((n,m), f)
numRows :: (Matrix a) -> Int
numRows (Mat ((n,_),_)) = n
numColumns :: (Matrix a) -> Int
numColumns (Mat ((_,m),_)) = m
at :: (Matrix a) -> (Int, Int) -> a
at (Mat ((n,m), f)) (i,j)| (i > 0) && (j > 0) || (i <= n) && (j <= m) = f (i,j)
mtranspose :: (Matrix a) -> (Matrix a)
mtranspose (Mat ((n,m),f)) = (Mat ((m,n),\(j,i) -> f (i,j)))
mmap :: (a -> b) -> (Matrix a) -> (Matrix b)
mmap h (Mat ((n,m),f)) = (Mat ((n,m), h.f))
This is my module:
module MatrixShow where
import Matrix
instance (Show a) => Show (Matrix a) where
show (Mat ((x,y),f)) = show f
Also is there some place where I can figure this out on my own, some link with instructions or some tutorial or something to learn how to do this.
The problem is with your indentation. The definition of show needs to be indented relative to the instance show a => Show (Matrix a). As it is, it appears that you are trying to define a new function called show, unrelated to the Show class, which you can't do.
#dfeuer, whose name I continue to have trouble spelling, has given you the direct answer - Haskell is sensitive to layout - but I'm going to try to help you with the underlying question that you've alluded to in the comments, without giving you the full answer.
You mentioned that you were confused about how matrices are represented. Read the source, Luke:
newtype Matrix a = Mat ((Int,Int), (Int,Int) -> a)
This newtype declaration tells you that a Matrix is formed from a pair ((Int,Int), (Int,Int) -> a). If you split up the tuple, that's an (Int, Int) pair and a function of type (Int, Int) -> a (a function with two integer arguments which returns something of arbitrary type a). This suggests to me that the first part of the tuple represents the size of the matrix, and the second part is a function mapping coordinates onto elements. This hypothesis seems to be confirmed by some of the example code your professor has given you - have a look at at or mtranspose, for example.
So, the question is - given the width and height of the matrix, and a function which will give you the element at a given coordinate, how do we give a string showing the items in the matrix?
The first thing we need to do is enumerate all the possible coordinates for the given width and height of the matrix. Haskell provides some useful syntactic constructs for this sort of operation - we can write [x .. y] to enumerate all the values between x and y, and use a list comprehension to unpack those enumerations in a nested loop.
coords :: (Int, Int) -- (width, height)
-> [(Int, Int)] -- (x, y) pairs
coords (w, h) = [(x, y) | x <- [0 .. w], y <- [0 .. h]]
For example:
ghci> coords (2, 4)
[(0,0),(0,1),(0,2),(0,3),(0,4),(1,0),(1,1),(1,2),(1,3),(1,4),(2,0),(2,1),(2,2),(2,3),(2,4)]
Now that we've worked out how to list all the possible coordinates in a matrix, how do we turn coordinates into elements of type a? Well, the Mat constructor contains a function (Int, Int) -> a which gives you the element associated with a single coordinate. We need to apply that function to each of the coordinates in the list which we just enumerated. This is what map does.
elems :: Matrix a -> [a]
elems (Mat (size, f)) = map f $ coords size
So, there's the code to enumerate the elements of a matrix. Can you figure out how to modify this code so that a) it shows the elements as a string and b) it shows them in a row-by-row fashion? You'll probably need to adjust both of these functions.
I suppose the broader point I'd like to make is that even though it feels like your professor has thrown you into the deep end, it's always possible to do a little detective work and figure out for yourself what something means. Many - most? - of the people answering questions on this site are self-taught programmers, myself included. We persevered!
After all, it's just code. If a computer's going to understand it then it must be written down on the page, and that means that you can understand it, too.

How can I generalize my sampling framework?

In the context of a stochastic ray tracer, I'd like to decouple the MC integration (path tracing, bidirectional path tracing) from sample generation (uniform random, stratified, poisson, metropolis, ...). Most of this is already implemented, but it's tedious to use. So I ditched that and try build something nicer, by splitting sampled computations in two phases: In SampleGen you are allowed to request a random value using the mk1d and mk2d functions, which are then supplied with actual Floats by the sampling algorithm. Those values can be examined in SampleRun to do the actual computation. Here's some code with the interesting bits of a stratified sampler and it's use:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Control.Applicative
import Control.Monad.State.Strict
import Control.Monad.Primitive
import System.Random.MWC as MWC
-- allows to construct sampled computations
newtype SampleGen s m a = SampleGen (StateT s m a)
deriving ( Functor, Applicative, Monad
, MonadState s, MonadTrans )
-- allows to evaluate sampled computations constructed in SampleGen
newtype SampleRun s m a = SampleRun (StateT s m a)
deriving ( Functor, Applicative, Monad
, MonadState s )
-- a sampled computation, parametrized over the generator's state g,
-- the evaluator's state r, the underlying monad m and the result
-- type a
type Sampled g r m a = SampleGen g m (SampleRun r m a)
----------------------
-- Stratified Sampling
----------------------
-- | we just count the number of requested 1D samples
type StratGen = Int
-- | the pre-computed values and a RNG for additional ones
type StratRun m = ([Float], Gen (PrimState m))
-- | specialization of Sampled for stratified sampling
type Stratified m a = Sampled StratGen (StratRun m) m a
-- | gives a sampled value in [0..1), this is kind
-- of the "prime" value, upon which all computations
-- are built
mk1d :: PrimMonad m => Stratified m Float
mk1d = do
n1d <- get
put $ n1d + 1
return $ SampleRun $ do
fs <- gets fst
if length fs > n1d
then return (fs !! n1d)
else gets snd >>= lift . MWC.uniform
-- | gives a pair of stratified values, should really also
-- be a "prime" value, but here we just construct them
-- from two 1D samples for fun
mk2d :: (Functor m, PrimMonad m) => Stratified m (Float, Float)
mk2d = mk1d >>= \f1 -> mk1d >>= \f2 ->
return $ (,) <$> f1 <*> f2
-- | evaluates a stratified computation
runStratified
:: (PrimMonad m)
=> Int -- ^ number of samples
-> Stratified m a -- ^ computation to evaluate
-> m [a] -- ^ the values produced, a list of nsamples values
runStratified nsamples (SampleGen c) = do
(SampleRun x, n1d) <- runStateT c 0
-- let's just pretend I'd use n1d to actually
-- compute stratified samples
gen <- MWC.create
replicateM nsamples $ evalStateT x ([{- samples would go here #-}], gen)
-- estimate Pi by Monte Carlo sampling
-- mcPi :: (Functor m, PrimMonad m) => Sampled g r m Float
mcPi :: (Functor m, PrimMonad m) => Stratified m Float
mcPi = do
v <- mk2d
return $ v >>= \(x, y) -> return $ if x * x + y * y < 1 then 4 else 0
main :: IO ()
main = do
vs <- runStratified 10000 mcPi :: IO [Float]
print $ sum vs / fromIntegral (length vs)
The missing part here is that in it's current form, the mcPi function has the type
mcPi :: (Functor m, PrimMonad m) => Stratified m Float
while it should really be something like
mcPi :: (Functor m, PrimMonad m) => Sampled g r m Float
Admitted, the four type parameters on Sampled aren't exactly beautiful, but at least something like this would be useful. In summary, I'm looking for something allowing to express computations like mcPi independent of the sampling algorithm, e.g.:
a uniform random sampler does not need to maintain any state in the SampleGen phase, and needs only a RNG in the SampleRun phase
both, the stratified and the poisson disk sampler (and probably others) keep track of the number of 1D and 2D samples needed and precompute them into a vector, and they would be allowed to share a SampleGen and SampleRun implementation, to differ only in what happens inbetween SampleGen and SampleRun (how the vector is actually filled)
a metropolis sampler would use a lazy sample generation technique in it's SampleRun phase
I'd like to compile it using GHC, so extensions like MultiParamTypeClasses and TypeFamilies are ok to me, but I did not come up with anything remotely usable.
PS: As motivation, some pretty pictures. And the code in it's current form is on GitHub
I'm going to start off with a radically different question, "What should the code look like"?, and then work towards the question "How is the sampling framework put together"?.
What the code should look like
The definition of mcPi should be
mcPi :: (Num s, Num p) => s -> s -> p
mcPi x y = if x * x + y * y < 1 then 4 else 0
The Monte Carlo estimation of pi is that, given two numbers (that happen to come from the interval [0..1)) pi is the area of a square if they fall within a circle, otherwise it's 0. The Monte Carlo estimation of pi doesn't know anything about computation. It doesn't know if it's going to be repeated, or anything about where the numbers came from. It does know that the numbers should be uniformly distributed over the square, but that's a topic for a different question. The Monte Carlo estimation of pi is just a function from the samples to the estimate.
Other random things will know that they are part of a random process. A simple random process might be: flip a coin, if the coin comes up "heads", flip it again.
simpleRandomProcess :: (Monad m, MonadCoinFlip m) => m Coin
simpleRandomProcess =
do
firstFlip <- flipACoin
case firstFlip of
Heads -> flipACoin
Tails -> firstFlip
This random process would like to be able to see things like
data Coin = Heads | Tails
class MonadCoinFlip m where
flipACoin :: m Coin -- The coin should be fair
Random processes may change how much random data they need based on the results of previous experiments. This suggests that we will ultimately need to provide a Monad.
The interface
You would like to "decouple the MC integration (path tracing, bidirectional path tracing) from sample generation (uniform random, stratified, poisson, metropolis, ...)". In your examples, they all want to sample floats. That suggests the following class
class MonadSample m where
sample :: m Float -- Should be on the interval [0..1)
This is very similar to the existing MonadRandom class, except for two things. A MonadRandom implementation essentially needs to provide a uniformly random Int in some range of its own choosing. Your sampler will provide a Float sample of unknown distribution on the interval [0..1). This is different enough to justify having your own new class.
Due to the upcoming Monad Applicative change, I'm instead going to suggest a different name for this class, SampleSource.
class SampleSource f where
sample :: f Float -- Should be on the interval [0..1)
sample replaces mk1d in your code. mk2d can also be replaced, again not knowing what the source of the samples will be. sample2d, the replacement for mk2d, will work with any Applicative sample source, it doesn't need it to be a Monad. The reason it doesn't need a Monad is it won't decide how many samples to get, or what else to do, based on the result of samples; the structure of its computation is known ahead of time.
sample2d :: (Applicative f, SampleSource f) => f (Float, Float)
sample2d = (,) <$> sample <*> sample
If you are going to allow the sample source to introduce interactions between dimensions, for example for Poisson disk sampling, you'd need to add that to the interface, either explicitly enumerating the dimensions
class SampleSource f where
sample :: f Float
sample2d :: f (Float, Float)
sample3d :: f (Float, Float, Float)
sample4d :: f (Float, Float, Float, Float)
or using some vector library.
class SampleSource f where
sample :: f Float
samples :: Int -> f (Vector Float)
Implementing the interface
Now, we need to describe how each of your sample sources can be used as a SampleSource. As an example, I'll implement SampleSource for one of the worst sample sources there is.
newtype ZeroSampleSourceT m a = ZeroSampleSourceT {
unZeroSampleSourceT :: IdentityT m a
} deriving (MonadTrans, Monad, Functor, MonadPlus, Applicative, Alternative, MonadIO)
instance (Monad m) => SampleSource (ZeroSampleSourceT m a) where
sample = return 0
runZeroSampleSourceT :: (Monad m) => ZeroSampleSourceT m a -> m a
runZeroSampleSourceT = runIdentityT . unZeroSampleSourceT
When all Monads are Applicative I'd instead write
instance (Applicative f) => SampleSource (ZeroSampleSourceT f) where
sample = pure 0
I'll also implement an MWC uniform SampleSource.
newtype MWCUniformSampleSourceT m a = MWCUniformSampleSourceT m a {
unMWCUniformSampleSourceT :: ReaderT (Gen (PrimState m)) m a
} deriving (MonadTrans, Monad, Functor, MonadPlus, Applicative, Alternative, MonadIO)
runMWCUniformSampleSourceT :: MWCUniformSampleSourceT m a -> (Gen (PrimState m)) -> m a
runMWCUniformSampleSourceT = runReaderT . unMWCUniformSampleSourceT
-- MWC's uniform generates floats on the open-closed interval (0,1]
uniformClosedOpen :: PrimMonad m => Gen (PrimState m) -> m Float
uniformClosedOpen = fmap (\x -> x - 2**(-33)) . uniform
instance (PrimMonad m) => SampleSource (MWCUniformSampleSourceT m) where
sample = MWCUniformSampleSourceT . ReaderT $ uniformClosedOpen
We won't completely implement Stratified or runStratified, since your example code doesn't contain complete implementations for them.
But I want to know how many samples will be used ahead of time
I'm not sure exactly what you are trying to do with "stratified" sampling. Pre-generating numbers, and using a generator when those run out isn't what I understand stratified sampling to be. If you are going to provide a monadic interface to something, you won't be able to tell ahead of time what will be executed, so you won't be able to predict how many samples a computation will need before you start executing it. If you can settle for only an Applicative interface, then you can test ahead of time how many samples will be needed by the entire computation.
But Poisson Disk sampling needs to know how many points are being sampled ahead of time
If a single sampling can depend on both the number of samples needed and the number of dimensions, like in Poisson Disk sampling, those need to be passed to the sampler when they become known.
class SampleSource f where
sample :: f Float
samples :: Int -> f ([Float])
sampleN :: Int -> f (Vector Float)
samplesN :: Int -> Int -> f ([Vector Float])
You could generalize this to sampling in arbitrary shapes in arbitrary dimensions, which is what we'd need to do if we took the next leap.
Applicative query language with a Monadic interpreter
We can go, very, very elaborate and make an Applicative query language for requests for samples. The language will need to add two features on top of what Applicative already does. It will need to be able to repeat requests and it will need to group requests for samples together to identify which groupings are meaningful. It's motivated by the following code, which wants to get 6 different 2d samples, where sample2d is the same as our first definition.
take 6 (repeat sample2d)
First, we'll need to be able to repeat things over and over. The nicest way to this would be if we could write, e.g.
take 6 (repeat sample) :: SampleSource f => [f Float]
We'd need a way to go from an [f a] to f [a]. This already exists; it's Data.Traversable's sequenceA, which requires that f be Applicative. So we already get repetition from Applicative.
sequenceA . take 6 . repeat $ sample2d
To group requests together, we'll add a function to mark which groupings are meaningful.
sequenceA . take 6 . repeat . mark $ sample2d
and a class for things that can mark some grouping. If we need more meaning than just groupings - for example if the internal things should be dependent or independent, we'd add it here.
class Mark f where
mark :: f a -> f a
If everything is going to be very homogeneous, we might add a class for query-able sample sources
class (Applicative f, Mark f, SampleSource f) => QueryableSampleSouce f where
Now we will talk about the idea of a monad that has a more-optimized query language. Here we will start using all of those GHC-specific extensions; specifically TypeFamilies.
class MonadQuery m where
type Query m :: * -> *
interpret :: (Query m a) -> m a
And finally a class for monad sample sources with an Applicative query language
class (MonadQuery m, QueryableSampleSource (Query m), SampleSource m, Monad m) => MonadSample m where
At this point, we will want to work out what laws these should follow. I'd suggest a few:
interpret sample == sample
interpret (sequenceA a) = sequence (interpret a)
That is, without a mark, sample sources don't get to do anything terribly special with the queries. This would mean that a query that wants to be subject to Poisson disk's special treatment of 2d points and special treatment of the set of points would need to be marked twice:
mark . sequenceA . take 6 . repeat . mark $ sample2d
The Applicative query language sort-of corresponds with your StratGen type; by having a mearly Applicative interface it allows you to look ahead at the structure of the incoming query. The Monad then corresponds with your StratRun type.

How to work around the first-order constraint on arrows?

What I mean by first-order constraint
First, I'll explain what I mean by first-order constraint on arrows:
Due to the way arrows desugar, you cannot use a locally bound name where an arrow command is expected in the arrow do-notation.
Here is an example to illustrate:
proc x -> f -< x + 1 desugars to arr (\x -> x + 1) >>> f and similarly proc x -> g x -< () would desugar to arr (\x -> ()) >>> g x, where the second x is a free variable. The GHC user guide explains this and says that when your arrow is also a monad you may make an instance of ArrowApply and use app to get around this. Something like, proc x -> g x -<< () becomes arr (\x -> (g x, ())) >>> app.
My Question
Yampa defines the accumHold function with this type: a -> SF (Event (a -> a)) a.
Due to this first-order limitation of arrows, I'm struggling to write the following function:
accumHoldNoiseR :: (RandomGen g, Random a) => (a,a) -> g -> SF (Event (a -> a)) a
accumHoldNoiseR r g = proc f -> do
n <- noiseR r g -< ()
accumHold n -< f
The definition above doesn't work because n is not in scope after desugaring.
Or, similarly this function, where the first part of the pair to SF is meant to be the initial value passed to accumHold
accumHold' :: SF (a,Event (a -> a)) -> a
accumHold' = ...
Is there some combinator or trick that I'm missing? Or is it not possible to write these definitions without an ArrowApply instance?
tl;dr: Is it possible to define accumHoldNoiseR :: (RandomGen g, Random a) => (a,a) -> g -> SF (Event (a -> a)) a or accumHold' :: SF (a,Event (a -> a)) -> a in yampa?
Note: There is no instance of ArrowApply for SF. My understanding is that it doesn't make sense to define one either. See "Programming with Arrows" for details.
This is a theoretical answer. Look to Roman Cheplyaka's answer to this question, which deals more with the practical details of what you're trying to achieve.
The reason n is out of scope is that for it to be in scope to use there, you would have the equivalent of bind or >>= from monads. It's the use of the results of a previous computation as a functional input to the next which makes something as powerful as a monad.
Hence you can supply n as a function argument to a subsequent arrow exactly when you can make an ArrowApply instance.
Chris Kuklewicz correctly points out in his comment that -<< would bring n into scope - it also uses app, so you need an ArrowApply instance.
Summary
Not unless you use ArrowApply. This is what ArrowApply is for.
noiseR is a signal function; it produces a stream of random numbers, not just one random number (for that, you'd just use randomR from System.Random).
On the other hand, the first argument of accumHold is just one, initial, value.
So this is not just some limitation — it actually prevents you from committing a type error.
If I understand correctly what you're trying to do, then simply using randomR should do the trick. Otherwise, please clarify why you need noiseR.
To help others understand how I worked around this I'll answer my own question.
I was trying to implement the game pong. I wanted the ball to start with a random velocity each round. I wanted to use accumHold to define the ball's velocity. I had some code like this:
ballPos = proc e -> mdo -- note the recursive do
{- some clipping calculations using (x,y) -}
...
vx <- accumHold 100 -< e `tag` collisionResponse paddleCollision
vy <- accumHold 100 -< e `tag` collisionResponse ceilingFloorCollision
(x,y) <- integral -< (vx,vy)
returnA -< (x,y)
I wanted to replace the 100s with random values (presumably from noiseR).
How I solved this instead is to accumulate over the direction, where collisionResponse just flips the sign (eventually I'll want to use the angle of the velocity relative to wall/paddle):
ballPos = proc (initV, e) -> mdo
{- some clipping calculations using (x,y) -}
...
(iVx,iVy) <- hold (0,0) -< initV
vx <- accumHold 1 -< e `tag` collisionResponse paddleCollision
vy <- accumHold 1 -< e `tag` collisionResponse ceilingFloorCollision
(x,y) <- integral -< (iVx*vx,iVy*vy)
returnA -< (x,y)
Lesson Learned:
You can often separate the value/state you want to accumulate into a behavior describing how it changes and a "magnitude" that describes its current value taking the behavior as input. In my case, I separate out the magnitude of the initial velocity, pass that as input to the signal function, and use accumHold to compute the affect on the ball (the behavior) of having collisions. So regardless of what the initial velocity was, hitting the walls "reflects" the ball. And that's exactly what the accumHold is accumulating.

Resources