how to structure monads in haskell programs which compute on random values? - haskell

I have a program that's almost pure mathematical computation. The problem is that some of those computations operate on monte carlo generated values.
It seems like I have two design options:
Either all my computation functions take additional parameter which contains a pre-generated monte carlo chain. This lets me keep pure functions everywhere, but since there's functions that call other functions this adds a lot of line noise to the code base.
The other option is to make all the computation functions monadic. This seems unfortunate since some of the functions aren't even using those random values they're just calling a function which calls a function which needs the random values.
Is there any guidance regarding the preferred design here? Specifically, the separation of monadic / non-monadic functions in the code where monte carlo values are concerned?

The other option is to make all the computation functions monadic. This seems unfortunate since some of the functions aren't even using those random values they're just calling a function which calls a function which needs the random values.
I would suggest following this approach, and I disagree with your assessment that it's "unfortunate." What monads are good at precisely is separating your pure code from your side effecting code. Your pure functions can just have pure types, and the Functor/Applicative/Monad methods serve to "hook them up" with the random generation parts. Meditate on the signatures of the standard operations (here specialized to some idealized Random monad type):
-- Apply a pure function to a randomly selected value.
fmap :: (a -> b) -> Random a -> Random b
-- Apply a randomly selected function to a randomly selected argument.
-- The two random choices are independent.
(<*>) :: Random (a -> b) -> Random a -> Random b
-- Apply a two-argument function to a randomly selected arguments.
-- The two random choices are independent.
liftA2 :: (a -> b -> c) -> Random a -> Random b -> Random c
-- Make a `Random b` choice whose distribution depends on the value
-- sampled from the `Random a`.
(>>=) :: Random a -> (a -> Random b) -> Random b
So the reformulated version of your approach is:
Write pure functions wherever you can.
Adapt these pure functions to work on the random values by using the Functor/Applicative/Monad class operations.
Wherever you spot a function that's mentioning the Random type superfluously, figure out how to factor the Random part out using those classes' operations (or the copious utility functions that exist for them).
This is not specific to random number generation, by the way, but applies to any monad.
You might enjoy reading this article, and might want to check out the author's random generation monad library:
"Encoding Statistical Independence, Statically"
https://hackage.haskell.org/package/mwc-probability
I doubt you need to follow the article's approach of using free monads for modeling, but the conceptual bits about probability distribution monads will likely be of some help.

tl;dr:
Consider to abstract the random function generator and pass it as an argument. Haskells type classes should help you to hide that abstraction as much as possible.
Unfortunately, there is no silver bullet here. Since you are using side effects, your "functions" simply aren't functions in the proper sense. Haskell does not allow you to hide that fact (which makes up the largest part of its safety guarantees). So in some way you will need to express this fact. You also seem to confuse the difference between monadic operations and (plain) functions: A function that (indirectly) uses random values is implicitly monadic. A non-monadic function can always be used inside a monadic operation. So you should probably implement all truly non-monadic functions as such and see how far that carries.
As a completely unrelated side-note: If lazyness is not a requirement and Haskells strong safety is too much a burden for you, but you still want to write (mostly) functional code, you could give OCaml a try (or any other ML dialect for that matter).

Related

What is an (composite) effect (possibly represented as a Monad+Monad Transformers) ? Precise, clear, short answer/definition? [duplicate]

Time and again I read the term effectful, but I am still unable to give a clear definition of what it means. I assume the correct context is effectful computations, but I've also seen the term effectful values)
I used to think that effectful means having side effects. But in Haskell there are no side-effects (except to some extent IO). Still there are effectful computations all over the place.
Then I read that monads are used to create effectful computations. I can somewhat understand this in the context of the State Monad. But I fail to see any side-effect in the Maybe monad. In general it seems to me, that Monads which wrap a function-like thing are easier to see as producing side-effects than Monads which just wrap a value.
When it comes to Applicative functors I am even more lost. I always saw applicative functors as a way to map a function with more than one argument. I cannot see any side-effect here. Or is there a difference between effectful and with effects?
A side effect is an observable interaction with its environment (apart from computing its result value). In Haskell, we try hard to avoid functions with such side effects. This even applies to IO actions: when an IO action is evaluated, no side effects are performed, they are executed only when the actions prescribed in the IO value are executed within main.
However, when working with abstractions that are related to composing computations, such as applicative functors and monads, it's convenient to somewhat distinguish between the actual value and the "rest", which we often call an "effect". In particular, if we have a type f of kind * -> *, then in f a the a part is "the value" and whatever "remains" is "the effect".
I intentionally quoted the terms, as there is no precise definition (as far as I know), it's merely a colloquial definition. In some cases there are no values at all, or multiple values. For example for Maybe the "effect" is that there might be no value (and the computation is aborted), for [] the "effect" is that there are multiple (or zero) values. For more complex types this distinction can be even more difficult.
The distinction between "effects" and "values" doesn't really depend on the abstraction. Functor, Applicative and Monad just give us tools what we can do with them (Functors allow to modify values inside, Applicatives allow to combine effects and Monads allow effects to depend on the previous values). But in the context of Monads, it's somewhat easier to create a mental picture of what is going on, because a monadic action can "see" the result value of the previous computation, as witnessed by the
(>>=) :: m a -> (a -> m b) -> m b
operator: The second function receives a value of type a, so we can imagine "the previous computation had some effect and now there is its result value with which we can do something".
In support of Petr Pudlák's answer, here is an argument concerning the origin of the broader notion of "effect" espoused there.
The phrase "effectful programming" shows up in the abstract of McBride and Patterson's Applicative Programming with Effects, the paper which introduced applicative functors:
In this paper, we introduce Applicative functors — an abstract characterisation of an applicative style of effectful programming, weaker than Monads and hence more widespread.
"Effect" and "effectful" appear in a handful of other passages of the paper; these ocurrences are deemed unremarkable enough not to require an explicit clarification. For instance, this remark is made just after the definition of Applicative is presented (p. 3):
In each example, there is a type constructor f that embeds the usual
notion of value, but supports its own peculiar way of giving meaning to the usual applicative language [...] We correspondingly introduce the Applicative class:
[A Haskell definition of Applicative]
This class generalises S and K [i.e. the S and K combinators, which show up in the Reader/function Applicative instance] from threading an environment to threading an effect in general.
From these quotes, we can infer that, in this context:
Effects are the things that Applicative threads "in general".
Effects are associated with the type constructors that are given Applicative instances.
Monad also deals with effects.
Following these leads, we can trace back this usage of "effect" back to at least Wadler's papers on monads. For instance, here is a quote from page 6 of Monads for functional programming:
In general, a function of type a → b is replaced by a function of type a
→ M b. This can be read as a function that accepts an argument of type a
and returns a result of type b, with a possible additional effect captured by
M. This effect may be to act on state, generate output, raise an exception, or what have you.
And from the same paper, page 21:
If monads encapsulate effects and lists form a monad, do lists correspond to some effect? Indeed they do, and the effect they correspond to is choice. One can think of a computation of type [a] as offering a choice of values, one for each element of the list. The monadic equivalent of a function of type a → b is a function of type a → [b].
The "correspond to some effect" turn of phrase here is key. It ties back to the more straightforward claim in the abstract:
Monads provide a convenient framework for simulating effects found in other languages, such as global state, exception handling, output, or non-determinism.
The pitch is that monads can be used to express things that, in "other languages", are typically encoded as side-effects -- that is, as Petr Pudlák puts it in his answer here, "an observable interaction with [a function's] environment (apart from computing its result value)". Through metonymy, that has readily led to "effect" acquiring a second meaning, broader than that of "side-effect" -- namely, whatever is introduced through a type constructor which is a Monad instance. Over time, this meaning was further generalised to cover other functor classes such as Applicative, as seen in McBride and Patterson's work.
In summary, I consider "effect" to have two reasonable meanings in Haskell parlance:
A "literal" or "absolute" one: an effect is a side-effect; and
A "generalised" or "relative" one: an effect is a functorial context.
On occasion, avoidable disagreements over terminology happen when each of the involved parties implicitly assumes a different meaning of "effect". Another possible point of contention involves whether it is legitimate to speak of effects when dealing with Functor alone, as opposed to subclasses such as Applicative or Monad (I believe it is okay to do so, in agreement with Petr Pudlák's answer to Why can applicative functors have side effects, but functors can't?).
To my mind, a "side effect" is anything that a normal function couldn't do. In other words, anything in addition to just returning a value.
Consider the following code block:
let
y = foo x
z = bar y
in foobar z
This calls foo, and then calls bar, and then calls foobar, three ordinary functions. Simple enough, right? Now consider this:
do
y <- foo x
z <- bar y
foobar z
This also calls three functions, but it also invisibly calls (>>=) between each pair of lines as well. And that means that some strange things happen, depending on what type of monad the functions are running in.
If this is the identity monad, nothing special happens. The monadic version does exactly the same thing as the pure version. There are no side-effects.
If each function returns a Maybe-something, then if (say) bar returns Nothing, the entire code block aborts. A normal function can't do that. (I.e., in the pure version, there is no way to prevent foobar being called.) So this version does something that the pure version cannot. Each function can return a value or abort the block. That's a side-effect.
If each function returns a list-of-something, then the code executes for all possible combinations of results. Again, in the pure version, there is no way to make any of the functions execute multiple times with different arguments. So that's a side-effect.
If each function runs in a state monad, then (for example) foo can send some data directly to foobar, in addition to the value you can see being passed through bar. Again, you can't do that with pure functions, so that's a side-effect.
In IO monad, you have all sorts of interesting effects. You can save files to disk (a file is basically a giant global variable), you can even affect code running on other computers (we call this network I/O).
The ST monad is a cut-down version of the IO monad. It allows mutable state, but self-contained computations cannot influence each other.
The STM monad lets multiple threads talk to each other, and may cause the code to execute multiple times, and... well, you can't do any of this with normal functions.
The continuation monad allows you to break people's minds! Arguably that is possible with pure functions...
"Effect is a very vague term and that is ok because we are trying to talk about something that is outside the language. Effect and side effect are not the same thing. Effects are good. Side effects are bugs.
Their lexical similarity is really unfortunate because it leads to a lot of people conflating these ideas when they read about them and people using one instead of the other so it leads to a lot of confusion."
see here for more: https://www.slideshare.net/pjschwarz/rob-norrisfunctionalprogrammingwitheffects
Functional programming context
Effect generally means the stuff (behaviour, additional logic) that is implemented in Applicative/Monad instances.
Also, it can be said that a simple value is extended with additional behaviour.
For example,
Option models the effects of optionality
or
Option is a monad that models the effect of optionality (of being something optional)
Resources: Resource 1,
Resource 2

Monads in Haskell and Purity

My question is whether monads in Haskell actually maintain Haskell's purity, and if so how. Frequently I have read about how side effects are impure but that side effects are needed for useful programs (e.g. I/O). In the next sentence it is stated that Haskell's solution to this is monads. Then monads are explained to some degree or another, but not really how they solve the side-effect problem.
I have seen this and this, and my interpretation of the answers is actually one that came to me in my own readings -- the "actions" of the IO monad are not the I/O themselves but objects that, when executed, perform I/O. But it occurs to me that one could make the same argument for any code or perhaps any compiled executable. Couldn't you say that a C++ program only produces side effects when the compiled code is executed? That all of C++ is inside the IO monad and so C++ is pure? I doubt this is true, but I honestly don't know in what way it is not. In fact, didn't Moggi (sp?) initially use monads to model the denotational semantics of imperative programs?
Some background: I am a fan of Haskell and functional programming and I hope to learn more about both as my studies continue. I understand the benefits of referential transparency, for example. The motivation for this question is that I am a grad student and I will be giving 2 1-hour presentations to a programming languages class, one covering Haskell in particular and the other covering functional programming in general. I suspect that the majority of the class is not familiar with functional programming, maybe having seen a bit of scheme. I hope to be able to (reasonably) clearly explain how monads solve the purity problem without going into category theory and the theoretical underpinnings of monads, which I wouldn't have time to cover and anyway I don't fully understand myself -- certainly not well enough to present.
I wonder if "purity" in this context is not really well-defined?
It's hard to argue conclusively in either direction because "pure" is not particularly well-defined. Certainly, something makes Haskell fundamentally different from other languages, and it's deeply related to managing side-effects and the IO type¹, but it's not clear exactly what that something is. Given a concrete definition to refer to we could just check if it applies, but this isn't easy: such definitions will tend to either not match everyone's expectations or be too broad to be useful.
So what makes Haskell special, then? In my view, it's the separation between evaluation and execution.
The base language—closely related to the λ-caluclus—is all about the former. You work with expressions that evaluate to other expressions, 1 + 1 to 2. No side-effects here, not because they were suppressed or removed but simply because they don't make sense in the first place. They're not part of the model² any more than, say, backtracking search is part of the model of Java (as opposed to Prolog).
If we just stuck to this base language with no added facilities for IO, I think it would be fairly uncontroversial to call it "pure". It would still be useful as, perhaps, a replacement for Mathematica. You would write your program as an expression and then get the result of evaluating the expression at the REPL. Nothing more than a fancy calculator, and nobody accuses the expression language you use in a calculator of being impure³!
But, of course, this is too limiting. We want to use our language to read files and serve web pages and draw pictures and control robots and interact with the user. So the question, then, is how to preserve everything we like about evaluating expressions while extending our language to do everything we want.
The answer we've come up with? IO. A special type of expression that our calculator-like language can evaluate which corresponds to doing some effectful actions. Crucially, evaluation still works just as before, even for things in IO. The effects get executed in the order specified by the resulting IO value, not based on how it was evaluated. IO is what we use to introduce and manage effects into our otherwise-pure expression language.
I think that's enough to make describing Haskell as "pure" meaningful.
footnotes
¹ Note how I said IO and not monads in general: the concept of a monad is immensely useful for dozens of things unrelated to input and output, and the IO types has to be more than just a monad to be useful. I feel the two are linked too closely in common discourse.
² This is why unsafePerformIO is so, well, unsafe: it breaks the core abstraction of the language. This is the same as, say, putzing with specific registers in C: it can both cause weird behavior and stop your code from being portable because it goes below C's level of abstraction.
³ Well, mostly, as long as we ignore things like generating random numbers.
A function with type, for example, a -> IO b always returns an identical IO action when given the same input; it is pure in that it cannot possibly inspect the environment, and obeys all the usual rules for pure functions. This means that, among other things, the compiler can apply all of its usual optimization rules to functions with an IO in their type, because it knows they are still pure functions.
Now, the IO action returned may, when run, look at the environment, read files, modify global state, whatever, all bets are off once you run an action. But you don't necessarily have to run an action; you can put five of them into a list and then run them in reverse of the order in which you created them, or never run some of them at all, if you want; you couldn't do this if IO actions implicitly ran themselves when you created them.
Consider this silly program:
main :: IO ()
main = do
inputs <- take 5 . lines <$> getContents
let [line1,line2,line3,line4,line5] = map print inputs
line3
line1
line2
line5
If you run this, and then enter 5 lines, you will see them printed back to you but in a different order, and with one omitted, even though our haskell program runs map print over them in the order they were received. You couldn't do this with C's printf, because it immediately performs its IO when called; haskell's version just returns an IO action, which you can still manipulate as a first-class value and do whatever you want with.
I see two main differences here:
1) In haskell, you can do things that are not in the IO monad. Why is this good? Because if you have a function definitelyDoesntLaunchNukes :: Int -> IO Int you don't know that the resulting IO action doesn't launch nukes, it might for all you know. cantLaunchNukes :: Int -> Int will definitely not launch any nukes (barring any ugly hacks that you should avoid in nearly all circumstances).
2) In haskell, it's not just a cute analogy: IO actions are first class values. You can put them in lists, and leave them there for as long as you want, they won't do anything unless they somehow become part of the main action. The closest that C has to that are function pointers, which are quite a bit more cumbersome to use. In C++ (and most modern imperative languages really) you have closures which technically could be used for this purpose, but rarely are - mainly because Haskell is pure and they aren't.
Why does that distinction matter here? Well, where are you going to get your other IO actions/closures from? Probably, functions/methods of some description. Which, in an impure language, can themselves have side effects, rendering the attempt of isolating them in these languages pointless.
fiction-mode: Active
It was quite a challenge, and I think a wormhole could be forming in the neighbour's backyard, but I managed to grab part of a Haskell I/O implementation from an alternate reality:
class Kleisli k where
infixr 1 >=>
simple :: (a -> b) -> (a -> k b)
(>=>) :: (a -> k b) -> (b -> k c) -> a -> k c
instance Kleisli IO where
simple = primSimpleIO
(>=>) = primPipeIO
primitive primSimpleIO :: (a -> b) -> (a -> IO b)
primitive primPipeIO :: (a -> IO b) -> (b -> IO c) -> a -> IO c
Back in our slightly-mutilated reality (sorry!), I have used this other form of Haskell I/O to define our form of Haskell I/O:
instance Monad IO where
return x = simple (const x) ()
m >>= k = (const m >=> k) ()
and it works!
fiction-mode: Offline
My question is whether monads in Haskell actually maintain Haskell's purity, and if so how.
The monadic interface, by itself, doesn't maintain restrain the effects - it is only an interface, albeit a jolly-versatile one. As my little work of fiction shows, there are other possible interfaces for the job - it's just a matter of how convenient they are to use in practice.
For an implementation of Haskell I/O, what keeps the effects under control is that all the pertinent entities, be they:
IO, simple, (>=>) etc
or:
IO, return, (>>=) etc
are abstract - how the implementation defines those is kept private.
Otherwise, you would be able to devise "novelties" like this:
what_the_heck =
do spare_world <- getWorld -- how easy was that?
launchMissiles -- let's mess everything up,
putWorld spare_world -- and bring it all back :-D
what_the_heck -- that was fun; let's do it again!
(Aren't you glad our reality isn't quite so pliable? ;-)
This observation extends to types like ST (encapsulated state) and STM (concurrency) and their stewards (runST, atomically etc). For types like lists, Maybe and Either, their orthodox definitions in Haskell means no visible effects.
So when you see an interface - monadic, applicative, etc - for certain abstract types, any effects (if they exist) are contained by keeping its implementation private; safe from being used in aberrant ways.

Sequence of Haskell's Random Generators

I'd like to write the following Haskell function that will provide me with a list of unique random generators:
randomGenerators :: RandomGen g => g -> [g]
Is the following a reasonable solution that won't create a situation where the "same" sequences are repeated?
randomGenerators g = iterate (fst . split) g
I am obviously throwing away half of all the generators but will this be a problem?
This will work, providing that split is implemented correctly (that is, if it produces uncorrelated generators). The System.Random one is believed to be robust (although its implementation of split contains a comment -- no statistical foundation for this!, so use it at your own risk and test for correlations).
Alternatively, you can use a RNG specifically designed to be used in parallel batches. For example, I have a package Random123 which implements counter-based generators (not very well optimized for performance right now, but may suit your purposes). There may be also bindings for DCMT library out there, or you can write your own.

ST Monad == code smell?

I'm working on implementing the UCT algorithm in Haskell, which requires a fair amount of data juggling. Without getting into too much detail, it's a simulation algorithm where, at each "step," a leaf node in the search tree is selected based on some statistical properties, a new child node is constructed at that leaf, and the stats corresponding to the new leaf and all of its ancestors are updated.
Given all that juggling, I'm not really sharp enough to figure out how to make the whole search tree a nice immutable data structure à la Okasaki. Instead, I've been playing around with the ST monad a bit, creating structures composed of mutable STRefs. A contrived example (unrelated to UCT):
import Control.Monad
import Control.Monad.ST
import Data.STRef
data STRefPair s a b = STRefPair { left :: STRef s a, right :: STRef s b }
mkStRefPair :: a -> b -> ST s (STRefPair s a b)
mkStRefPair a b = do
a' <- newSTRef a
b' <- newSTRef b
return $ STRefPair a' b'
derp :: (Num a, Num b) => STRefPair s a b -> ST s ()
derp p = do
modifySTRef (left p) (\x -> x + 1)
modifySTRef (right p) (\x -> x - 1)
herp :: (Num a, Num b) => (a, b)
herp = runST $ do
p <- mkStRefPair 0 0
replicateM_ 10 $ derp p
a <- readSTRef $ left p
b <- readSTRef $ right p
return (a, b)
main = print herp -- should print (10, -10)
Obviously this particular example would be much easier to write without using ST, but hopefully it's clear where I'm going with this... if I were to apply this sort of style to my UCT use case, is that wrong-headed?
Somebody asked a similar question here a couple years back, but I think my question is a bit different... I have no problem using monads to encapsulate mutable state when appropriate, but it's that "when appropriate" clause that gets me. I'm worried that I'm reverting to an object-oriented mindset prematurely, where I have a bunch of objects with getters and setters. Not exactly idiomatic Haskell...
On the other hand, if it is a reasonable coding style for some set of problems, I guess my question becomes: are there any well-known ways to keep this kind of code readable and maintainable? I'm sort of grossed out by all the explicit reads and writes, and especially grossed out by having to translate from my STRef-based structures inside the ST monad to isomorphic but immutable structures outside.
I don't use ST much, but sometimes it is just the best solution. This can be in many scenarios:
There are already well-known, efficient ways to solve a problem. Quicksort is a perfect example of this. It is known for its speed and in-place behavior, which cannot be imitated by pure code very well.
You need rigid time and space bounds. Especially with lazy evaluation (and Haskell doesn't even specify whether there is lazy evaluation, just that it is non-strict), the behavior of your programs can be very unpredictable. Whether there is a memory leak could depend on whether a certain optimization is enabled. This is very different from imperative code, which has a fixed set of variables (usually) and defined evaluation order.
You've got a deadline. Although the pure style is almost always better practice and cleaner code, if you are used to writing imperatively and need the code soon, starting imperative and moving to functional later is a perfectly reasonable choice.
When I do use ST (and other monads), I try to follow these general guidelines:
Use Applicative style often. This makes the code easier to read and, if you do switch to an immutable version, much easier to convert. Not only that, but Applicative style is much more compact.
Don't just use ST. If you program only in ST, the result will be no better than a huge C program, possibly worse because of the explicit reads and writes. Instead, intersperse pure Haskell code where it applies. I often find myself using things like STRef s (Map k [v]). The map itself is being mutated, but much of the heavy lifting is done purely.
Don't remake libraries if you don't have to. A lot of code written for IO can be cleanly, and fairly mechanically, converted to ST. Replacing all the IORefs with STRefs and IOs with STs in Data.HashTable was much easier than writing a hand-coded hash table implementation would have been, and probably faster too.
One last note - if you are having trouble with the explicit reads and writes, there are ways around it.
Algorithms which make use of mutation and algorithms which do not are different algorithms. Sometimes there is a strightforward bounds-preserving translation from the former to the latter, sometimes a difficult one, and sometimes only one which does not preserve complexity bounds.
A skim of the paper reveals to me that I don't think it makes essential use of mutation -- and so I think a potentially really nifty lazy functional algorithm could be developed. But it would be a different but related algorithm to that described.
Below, I describe one such approach -- not necessarily the best or most clever, but pretty straightforward:
Here's the setup a I understand it -- A) a branching tree is constructed B) payoffs are then pushed back from the leafs to the root which then indicates the best choice at any given step. But this is expensive, so instead, only portions of the tree are explored to the leafs in a nondeterministic manner. Furthermore, each further exploration of the tree is determined by what's been learned in previous explorations.
So we build code to describe the "stage-wise" tree. Then, we have another data structure to define a partially explored tree along with partial reward estimates. We then have a function of randseed -> ptree -> ptree that given a random seed and a partially explored tree, embarks on one further exploration of the tree, updating the ptree structure as we go. Then, we can just iterate this function over an empty seed ptree to get a list of increasingly more sampled spaces in the ptree. We then can walk this list until some specified cutoff condition is met.
So now we've gone from one algorithm where everything is blended together to three distinct steps -- 1) building the whole state tree, lazily, 2) updating some partial exploration with some sampling of a structure and 3) deciding when we've gathered enough samples.
It's can be really difficult to tell when using ST is appropriate. I would suggest you do it with ST and without ST (not necessarily in that order). Keep the non-ST version simple; using ST should be seen as an optimization, and you don't want to do that until you know you need it.
I have to admit that I cannot read the Haskell code. But if you use ST for mutating the tree, then you can probably replace this with an immutable tree without losing much because:
Same complexity for mutable and immutable tree
You have to mutate every node above the new leaf. An immutable tree has to replace all nodes above the modified node. So in both cases the touched nodes are the same, thus you don't gain anything in complexity.
For e.g. Java object creation is more expensive than mutation, so maybe you can gain a bit here in Haskell by using mutation. But this I don't know for sure. But a small gain does not buy you much because of the next point.
Updating the tree is presumably not the bottleneck
The evaluation of the new leaf will probably be much more expensive than updating the tree. At least this is the case for UCT in computer Go.
Use of the ST monad is usually (but not always) as an optimization. For any optimization, I apply the same procedure:
Write the code without it,
Profile and identify bottlenecks,
Incrementally rewrite the bottlenecks and test for improvements/regressions,
The other use case I know of is as an alternative to the state monad. The key difference being that with the state monad the type of all of the data stored is specified in a top-down way, whereas with the ST monad it is specified bottom-up. There are cases where this is useful.

Unsure of how to design a useful library using combinators

I've been reading about combinators and seen how useful they are (for example, in Haskell's Parsec). My problem is that I'm not quite sure how to use them practically.
Here's an outline of the problem: distributions can be generated, filtered, and modified. Distributions may be combined to create new distributions.
The basic interfaces are (in pseudo-Haskell type terminology):
generator:: parameters -> distribution
selector:: parameters -> (distribution -> distribution)
modifier:: parameters -> (distribution -> distribution)
Now, I think that I see three combinators:
combine:: generator -> generator -> generator
filter:: generator -> selector -> generator
modify:: generator -> modifier -> generator
Are these actually combinators? Do the combinators make sense/are there any other obvious combinators that I'm missing?
Thanks for any advice.
The selector and modifier functions are already perfectly good combinators! Along with generator and combine you can do stuff like (I'm going to assume statistical distributions for concreteness and just make things up!):
modifier (Scale 3.0) $ generator StandardGaussian `combine` selector (LargerThan 10) . modifier (Shift 7) $ generator (Binomial 30 0.2)
You may have to mess around a bit with the priority of the combine operator for this to work smoothly :)
In general, when I'm trying to design a combinator library for values of type A, I like to keep my A's "at the end", so that the partially applied combinators (your selector and modifier) can be chained together with . instead of having to flip through hoops.
Here's a nice blog article which can help you design combinators, it influenced a lot of my thinking: Semantic Editor Combinators.
EDIT: I may have misread your question, given the type signature of combine. Maybe I'm missing something, but wouldn't the distributions be the more natural objects your combinator should work on?

Resources