Use of Haskell state monad a code smell? - haskell

God I hate the term "code smell", but I can't think of anything more accurate.
I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).
During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.
I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).
I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.
Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:
empyGraph :: Graph
mkLabel :: Label -> Graph
mkAssignment :: Assignment -> Graph -- modify a register or memory
mkTransfer :: ControlTransfer -> Graph -- any control transfer
(<*>) :: Graph -> Graph -> Graph
But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:
withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
-> Graph -- code in the 'then' branch
-> Graph -- code in the 'else' branch
-> Graph -- resulting if-then-else construct
The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.
The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:
type Graph = RealGraph -> [Label] -> (RealGraph, [Label])
or a bit more accurately
type Graph = RealGraph -> State [Label] RealGraph
-- a Graph is a monadic function from a successor RealGraph to a new RealGraph
With the state monad hidden behind a layer of abstraction, it's not smelly at all!

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.
This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.
An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:
VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed
Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.
The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.
All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).
When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.
One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?
With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.
Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.
Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.
At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.
Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.
In that case, you could manually write the sort of code that attributes grammars generate for you, for example:
data AbstractSyntax = Literal Int | Block AbstractSyntax
| Comment String AbstractSyntax
compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _ = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
in (code, s : comments')
generateCode :: Int -> Code
labelCode :: Label -> Code -> Code

It's possible that you may want an applicative functor instead of a
monad:
http://www.haskell.org/haskellwiki/Applicative_functor
I think the original paper explains it better than the wiki, however:
http://www.soi.city.ac.uk/~ross/papers/Applicative.html

I don't think using the State Monad is a code smell when it used to model state.
If you need to thread state through your functions,
you can do this explicitly, taking the the state as an argument and returning it in each function.
The State Monad offers a good abstraction: it passes the state along for you and
provides lots of useful function to combine functions that require state.
In this case, using the State Monad (or Applicatives) is not a code smell.
However, if you use the State Monad to emulate an imperative style of programming
while a functional solution would suffice, you are just making things complicated.

In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.

Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.
The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.

let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.
Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.
So, yes, it's probably a code smell.

Related

Use type classes to implement dependency inversion in a Haskell application?

One major architectural goal when designing large applications is to reduce coupling and dependencies. By dependencies, I mean source-code dependencies, when one function or data type uses another function or another type. A high-level architecture guideline seems to be the Ports & Adapters architecture, with slight variations also referred to as Onion Architecture, Hexagonal Architecture, or Clean Architecture: Types and functions that model the domain of the application are at the center, then come use cases that provide useful services on the basis of the domain, and in the outermost ring are technical aspects like persistence, networking and UI.
The dependency rule says that dependencies must point inwards only. E.g.; persistence may depend on functions and types from use cases, and use cases may depend on functions and types from the domain. But the domain is not allowed to depend on the outer rings. How should I implement this kind of architecture in Haskell? To make it concrete: How can I implement a use case module that does not depend (= import) functions and types from a persistence module, even though it needs to retrieve and store data?
Say I want to implement a use case order placement via a function U.placeOrder :: D.Customer -> [D.LineItem] -> IO U.OrderPlacementResult, which creates an order from line items and attempts to persist the order. Here, U indicates the use case module and D the domain module. The function returns an IO action because it somehow needs to persist the order. However, the persistence itself is in the outermost architectural ring - implemented in some module P; so, the above function must not depend on anything exported from P.
I can imagine two generic solutions:
Higher order functions: The function U.placeOrder takes an additional function argument, say U.OrderDto -> U.PersistenceResult. This function is implemented in the persistence (P) module, but it depends on types of the U module, whereas the U module does not need to declare a dependency on P.
Type classes: The U module defines a Persistence type class that declares the above function. The P module depends on this type class and provides an instance for it.
Variant 1 is quite explicit but not very general. Potentially it results in functions with many arguments. Variant 2 is less verbose (see, for example, here). However, Variant 2 results in many unprincipled type classes, something considered bad practice in most modern Haskell textbooks and tutorials.
So, I am left with two questions:
Am I missing other alternatives?
Which approach is the generally recommended one, if any?
There are, indeed, other alternatives (see below).
While you can use partial application as dependency injection, I don't consider it a proper functional architecture, because it makes everything impure.
With your current example, it doesn't seem to matter too much, because U.placeOrder is already impure, but in general, you'd want your Haskell code to consist of as much referentially transparent code as possible.
You sometimes see a suggestion involving the Reader monad, where the 'dependencies' are passed to the function as the reader context instead of as straight function arguments, but as far as I can tell, these are just (isomorphic?) variations of the same idea, with the same problems.
Better alternatives are functional core, imperative shell, and free monads. There may be other alternatives as well, but these are the ones I'm aware of.
Functional core, imperative shell
You can often factor your code so that your domain model is defined as a set of pure functions. This is often easier to do in languages like Haskell and F# because you can use sum types to communicate decisions. The U.placeOrder function might, for example, look like this:
U.placeOrder :: D.Customer -> [D.LineItem] -> U.OrderPlacementDecision
Notice that this is a pure function, where U.OrderPlacementDecision might be a sum type that enumerates all the possible outcomes of the use case.
That's your functional core. You'd then compose your imperative shell (e.g. your main function) in an impureim sandwich:
main :: IO ()
main = do
stuffFromDb <- -- call the persistence module code here
customer -- initialised from persistence module, or some other place
lineItems -- ditto
let decision = U.placeOrder customer lineItems
_ <- persist decision
return ()
(I've obviously not tried to type-check that code, but I hope it's sufficiently correct to get the point accross.)
Free monads
The functional core, imperative shell is by far the simplest way to achieve the desired architectural outcome, and it's conspicuously often possible to get away with. Still, there are cases where that's not possible. In those cases, you can instead use free monads.
With free monads, you can define data structures that are roughly equivalent to object-oriented interfaces. Like in the functional core, imperative shell case, these data structures are sum types, which means that you can keep your functions pure. You can then run an impure interpreter over the generated expression tree.
I've written an article series about how to think about dependency injection in F# and Haskell. I've also recently published an article that (among other things) showcases this technique. Most of my articles are accompanied by GitHub repositories.

Understanding STG

The design of GHC is based on something called STG, which stands for "spineless, tagless G-machine".
Now G-machine is apparently short for "graph reduction machine", which defines how laziness is implemented. Unevaluated thunks are stored as an expression tree, and executing the program involves reducing these down to normal form. (A tree is an acyclic graph, but Haskell's pervasive recursion means that Haskell expressions form general graphs, hence graph-reduction and not tree-reduction.)
What is less clear are the terms "spineless" and "tagless".
I think that "spineless" refers to the fact that function applications do not have a "spine" of function application nodes. Instead, you have an object that names the function called and points to all of its arguments. Is that correct?
I thought that "tagless" referred to constructor nodes not being "tagged" with a constructor ID, and instead case-expressions are resolved using a jump instruction. But now I'm not sure that's correct. Instead, it seems to refer to the fact that nodes aren't tagged with their evaluation state. Can anyone clarify which (if any) of these interpretations is correct?
GHC wiki contains an introductory article about STG written by Max Bolingbroke:
The STG machine is an essential part of GHC, the world's leading
Haskell compiler. It defines how the Haskell evaluation model should
be efficiently implemented on standard hardware. Despite this key
role, it is generally poorly understood amongst GHC users. This
document aims to provide an overview of the STG machine in its modern,
eval/apply-based, pointer-tagged incarnation by a series of simple
examples showing how Haskell source code is compiled.
You are right about the "Spineless", that is it, if I'm correct. It is basically described on the 1988 article by Burn, Peyton-Jones and Robson, "The Spineless G-Machine". I've read it, but it is not so fresh in my mind.
Basically, on the G-Machine, all stack entries point to an application node except the one on the top, which points to the head of the expression. Those application nodes make access to the arguments indirect, and in some G-Machine descriptions, before applying a function the stack is rearranged, so that the last n nodes on the stack are made to point to the argument instead of the application node.
If I am not mistaken, the "Spineless" part is about avoiding having these application nodes (which are called the spine of the graph) on the stack altogether, thus avoiding that re-arrangement before each reduction.
As to the "Tagless" part, you are more correct now that you used to be, but... Using tags on nodes is a very, very old thing. Can you think on how a dynamically-typed language such as LISP was implemented? Every cell must have its value and a tag which says the type. If you want something you must examine the tag and act accordingly. In the case of Haskell, the evaluation state is more important than type, Haskell is statically typed.
In the STG machine, tags are not used. Tags were replaced, maybe through inspiration of OO lanaguages, by a set of function pointers. When you want the value of a node which has not been computed, the function will compute it. When it is already computed, the function returns it. This allows for a lot of creativity in what this function can do without making client code any more complex.
This "Tagless" part yes, is described in the "implementation of functional languages on stock hardware" article by SPJ.
There is also objection to this "tagless" thing. Basically, this involves function pointers, which is an indirect jump in computer architecture terms. And indirect jumps are an obstacle to branch prediction and hence to pipelining in general. Because either the architecture considers there is a data dependency on the jump argument, halting the pipeline, or the architecture assumes it does not know the destination and halts the pipeline.
The answer by migle is exactly what spinlessness and taglessness of the STGM mean. Today it does not worth trying to understand the names of the two features because the names stem from the history of graph reduction technologies: from G-machine, Spineless G-machine, and Spineless and Tagless G-machine.
The G-machine uses both spine and tags. A spine is a list of edges from the root node of a function application to the node of the function. For example, a function application of "f e1 e2 ... en" is represented as
root = AP left_n en
left_n = AP left_n-1 en-1 ...
left_2 = AP left_1 e1
left_1 = FUN f
in G-machine, and so a spine is a list of edges consisting of left_n -> left_n-1 -> ... -> left_2 -> left_1. It is literally a spine of the function application!
In the same function application, there are tags AP and FUN.
In the next advanced G-machine so called the Spineless G-machine, there is no such spine by representing such a function application in a contiguous block whose first slot points to f, the second slot points to e1, ..., and the n+1-th slot points to en. In this representation, we do not need a spine. But the block starts a special tag designating the number of slots and so on.
In the most advanced G-machine so called the Spineless Tagless G-machine, such a tag is replaced with a function pointer. To evaluate a function application is to jump to the code by the function pointer.
It is unfortunate to find that Simone Peyton Jones's STGM paper does not give compilation/evaluation rules in some abstract level, and so it is very natural for people not to be easy to understand the essence of the STGM.
You want to read SPJ's book about functional PL implementation:
http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/index.htm

Where to apply Behavior (and other types) in FRP

I'm working on a program using reactive-banana, and I'm wondering how to structure my types with the basic FRP building blocks.
For instance, here's a simplified example from my real program: say my system is composed primarily of widgets — in my program, pieces of text that vary over time.
I could have
newtype Widget = Widget { widgetText :: Behavior String }
but I could also have
newtype Widget = Widget { widgetText :: String }
and use Behavior Widget when I want to talk about time-varying behaviour. This seems to make things "simpler", and means I can use Behavior operations more directly, rather than having to unpack and repack Widgets to do it.
On the other hand, the former seems to avoid duplication in the code that actually defines widgets, since almost all of the widgets vary over time, and I find myself defining even the few that don't with Behavior, since it lets me combine them with the others in a more consistent manner.
As another example, with both representations, it makes sense to have a Monoid instance (and I want to have one in my program), but the implementation for the latter seems more natural (since it's just a trivial lifting of the list monoid to the newtype).
(My actual program uses Discrete rather than Behavior, but I don't think that's relevant.)
Similarly, should I use Behavior (Coord,Coord) or (Behavior Coord, Behavior Coord) to represent a 2D point? In this case, the former seems like the obvious choice; but when it's a five-element record representing something like an entity in a game, the choice seems less clear.
In essence, all these problems reduce down to:
When using FRP, at what layer should I apply the Behavior type?
(The same question applies to Event too, although to a lesser degree.)
The rules I use when developing FRP applications, are:
Isolate the "thing that changes" as much as possible.
Group "things that change simultaneously" into one Behavior/Event.
The reason for (1) is that it becomes easier to create and compose abstract operations if the data types that you use are as primitive as possible.
The reason for this is that instances such as Monoid can be reused for raw types, as you described.
Note that you can use Lenses to easily modify the "contents" of a datatype as if they were raw values, so that extra "wrapping/unwrapping" isn't a problem, mostly. (See this recent tutorial for an introduction to this particular Lens implementation; there are others)
The reason for (2) is that it just removes unnecessary overhead. If two things change simultaneously, they "have the same behavior", so they should be modeled as such.
Ergo/tl;dr: You should use newtype Widget = Widget { widgetText :: Behavior String } because of (1), and you should use Behavior (Coord, Coord) because of (2) (since both coordinates usually change simultaneously).
I agree with dflemstr's advice to
Isolate the "thing that changes" as much as possible.
Group "things that change simultaneously" into one Behavior/Event.
and would like to offer additional reasons for these rules of thumb.
The question boils down to the following: you want to represent a pair (tuple) of values that change in time and the question is whether to use
a. (Behavior x, Behavior y) - a pair of behaviors
b. Behavior (x,y) - a behavior of pairs
Reasons for preferring one over the other are
a over b.
In a push-driven implementation, the change of a behavior will trigger a recalculation of all behaviors that depend on it.
Now, consider a behaviors whose value depends only on the first component x of the pair. In variant a, a change of the second component y will not recompute the behavior. But in variant b, the behavior will be recalculated, even while its value does not depend on the second component at all. In other words, it's a question of fine-grained vs coarse-grained dependencies.
This is an argument for advice 1. Of course, this is not of much importance when both behaviors tend to change simultaneously, which yields advice 2.
Of course, the library should offer a way to offer fine-grained dependencies even for variant b. As of reactive-banana version 0.4.3, this is not possible, but don't worry about that for now, my push-driven implementation is going to mature in future versions.
b over a.
Seeing that reactive-banana version 0.4.3 does not offer dynamic event switching yet, there are certain programs that you can only write if you put all components in a single behavior. The canoncial example would be a program that features variable number of counters, i.e. an extension of the TwoCounter.hs example. You have to represent it as a time-changing list of values
counters :: Behavior [Int]
because there is no way to keep track of a dynamic collection of behaviors yet. That said, the next version of reactive-banana will include dynamic event switching.
Also, you can always convert from variant a to variant b without any trouble
uncurry (liftA2 (,)) :: (Behavior a, Behavior b) -> Behavior (a,b)

Why does Haskell not have an I Monad (for input only, unlike the IO monad)?

Conceptually, it seems that a computation that performs output is very different from one that performs input only. The latter is, in one sense, much purer.
I, for one, would like to have a way to separate the input only parts of my programme from the ones that might actually write something out.
So, why is there no input only Monad?
Any reason why it wouldn't work to have an I monad (and an O Monad, which could be combined into the IO Monad)?
Edit: I mostly meant input as reading files, not interacting with the user. This is also my use case, where I can assume that input files do not change during the execution of the programme (otherwise, it's fine to get undefined behaviour).
I disagree with bdonlan's answer. It's true that neither input nor output are more "pure" but they are quite different. It's quite valid to critique IO as the single "sin bin" where all effects get crammed together, and it does make ensuring certain properties harder. For example, if you have many functions that you know only read from certain memory locations, and which could never cause those locations to be altered, it would be nice to know that you can reorder their execution. Or if you have a program that uses forkIO and MVars, it would be nice to know, based on its type, that it isn't also reading /etc/passwd.
Furthermore, one can compose monadic effects in a fashion besides just stacked transformers. You can't do this with all monads (just free monads), but for a case like this that's all you really need. The iospec package, for example, provides a pure specification of IO -- it doesn't seperate reading and writing, but it does seperate them from, e.g., STM, MVars, forkIO, soforth.
http://hackage.haskell.org/package/IOSpec
The key ideas for how you can combine the different monads cleanly are described in the Data Types a la Carte paper (great reading, very influential, can't recommend enough, etc.etc.).
The 'Input' side of the IO monad is just as much output as it is input. If you consume a line of input, the fact that you consumed that input is communicated to the outside, and also serves to be recorded as impure state (ie, you don't consume the same line again later); it's just as much an output operation as a putStrLn. Additionally, input operations must be ordered with respect to output operations; this again limits how much you can separate the two.
If you want a pure read-only monad, you should probably use the reader monad instead.
That said, you seem to be a bit confused about what combining monads can do. While you can indeed combine two monads (assuming one is a monad transformer) and get some kind of hybrid semantics, you have to be able to run the result. That is, even if you could define an IT (OT Identity) r, how do you run it? You have no root IO monad in this case, so main must be a pure function. Which would mean you'd have main = runIdentity . runOT . runIT $ .... Which is nonsense, since you're getting impure effects from a pure context.
In other words, the type of the IO monad has to be fixed. It can't be a user-selectable transformed type, because its type is nailed down into main. Sure, you could call it I (O Identity), but you don't gain anything; O (I Identity) would be a useless type, as would be I [] or O Maybe, because you'd never be able to run any of these.
Of course, if IO is left as the fundamental IO monad type, you could define routines like:
runI :: I Identity r -> IO r
This works, but again, you can't have anything underneath this I monad very easily, and you're not gaining much from this complexity. What would it even mean to have an Output monad transformed over a List base monad, anyway?
When you obtain input, you cause side-effects that changes both the state of the outside world (the input is consumed) and your program (the input is used). When you output, you cause side-effects that only change the state of the outside world (output is produced); the act of outputting itself does not change the state of your program. So you might actually say that O is more "pure" than I.
Except that output does actually change the execution state of your program (It won't repeat the same output operation over and over; it has to have some sort of state change in order to move on). It all depends on how you look at it. But it's so much easier to lump the dirtiness of input and output into the same monad. Any useful program will both input and output. You can categorize the operations you use into one or the other, but I'm not seeing a convincing reason to employ the type system for the task.
Either you're messing with the outside world or you're not.
Short answer: IO is not I/O.
Other folks have longer answers if you like.
I think the division between pure and impure code is somewhat arbitrary. It depends on where you put the barrier. Haskell's designers decided to clearly separate pure functional part of the language from the rest.
So we have IO monad which incorporates all the possible effects (as different, as disk reads/writes, networking, memory access). And language enforces a clear division by means of return type. And this induces a kind of thinking which divides everything in pure and impure.
If the information security is concerned, it would be quite naturally to separate reading and writing. But for haskell's initial goal, to be a standard lazy pure functional language, it was an overkill.

Creative uses of monads

I'm looking for creative uses of monads to learn from. I've read somewhere that monads have been used for example in AI, but being a monad newbie, I fail to see how.
Please include a link to the source code and sample usages. No standard monads please.
Phil Wadler has written many papers on monads, but the one to read first is a lot of fun and will be accessible to any programmer; it's called The essence of functional programming. The paper includes source code and sample usages.
A personal favorite of mine is the probability monad; if you can find Sungwoo Park's PhD thesis, it has a number of interesting example codes from robotics.
There's also LogicT (backtracking monad transformer with fair operations and pruning).
It has good value to AI Search algorithms because of its constructs for fair disjunctions, for example, easily enabling computations that succeed an infinite number of times to be combined (interleaved).
It's usage is described in the ICFP'05 paper Backtracking, Interleaving, and Terminating Monad Transformers
you can find interesting and advanced monads in the blog A Neighborhood of Infinity. I can note the Vector Space Monad, and its use for rational tangles description. Unfortunately,I don't think I understand this well enough to explain it here.
One of my favorite monads is Martin Escardo's search monad. It can be found on hackage in infinite-search package.
It is the monad of "search functions" for a set of elements of type a, namely (a -> Bool) -> Maybe a (finding an element in the set matching a given predicate).
One interesting use of monad is in parsing. Parsec is the standard example.
Read series of articles on monads used to model probability and probabilistic processes here : http://www.randomhacks.net/articles/2007/03/03/smart-classification-with-haskell (follow links to prev/next parts)
Harpy, a package for run-time generation of x86 machine code, uses a code generation monad. From the description:
This is a combined reader-state-exception monad which handles all the details of handling code buffers, emitting binary data, relocation etc.
All the code generation functions in module Harpy.X86CodeGen live in this monad and use its error reporting facilities as well as the internal state maintained by the monad.
The library user can pass a user environment and user state through the monad. This state is independent from the internal state and may be used by higher-level code generation libraries to maintain their own state across code generation operations.
I found this a particularly interesting example because I think that this pattern is not uncommon: I'd invented something quite similar myself for generating a set of internal messages for my application based on messages received from a (stock) market data feed. It turns out to be an extremely comfortable way to have a framework keep track of various "global" things whilst composing simple operations that in and of themselves keep no state.
I took one step further his idea of having a user state (which I call a "substate") that could also be passed through the monad: I have a mechanism for switching out and restoring state during the monad run:
-- | Given a generator that uses different substate type, convert it
-- to a generator that runs with our substate type. As well as the
-- other-substate-type generator, the caller must provide an initial
-- substate for that generator and a function taking the final substate
-- of the generator and producing a new substate of our type. This
-- preserves all other (non-substate) parts of the master state touched
-- by the generator.
--
mgConvertSubstate :: MsgGen msg st' a -> st' -> (st' -> st) -> MsgGen msg st a
This is used for subgroups of combinators that had their own state needed for a short period. These run with just their state, not knowing anything about the state of the generator that invoked it (which helps make things more modular), and yet this preserves any non-user-specific state, such as the current list of messages generated and the current set of warnings or errors, as well as the control flow (i.e., allowing total aborts to flow upwards).
I'd like to list a couple of monads not yet mentioned in other answers.
Enumerate and weighted search monads
The Omega monad can be used to productively traverse infinite lists of results. Compare:
>>> take 10 $ liftM2 (,) [0..] [0..]
[(0,0),(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8),(0,9)]
>>> take 10 $ runOmega $ liftM2 (,) (each' [0..]) (each' [0..])
[(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)]
With a bit more advanced WeightedSearch monad it is also possible to assign weights to computations so that results of computations with lower weights would appear first in the output.
Accumulating errors monad
A useful These data type forms a Monad similar to Either, but able to accumulate errors rather. The package also defines MonadChronicle class as well as ChronicleT monad transformer based on These.

Resources