How can I Implement Statecharts in Haskell? [closed]

How can I Implement Statecharts in Haskell? [closed] - haskell

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
After having read the excellent book "Practical UML Statecharts in
C/C++" by Miro Samek, I am eager to try them out sometime. More
recently, I have started to teach myself Haskell and functional
programming.
Only a few chapters into my book on Haskell, it struck me that
statecharts might be difficult, and even against the grain of Haskell. After all, a large effort
goes into creating state-free programs, or at least keeping all impure
parts of the code well separated from the pure ones.
When I searched for "Haskell" combined with "statechart" on the net, I
found almost nothing! This piqued my curiosity. Haskell is, after all, a
reasonably old, general purpose programming language. How could it be
that there seems to be virtually no activity related to statecharts?
Several possible explanations were suggested in the Haskell sub-Reddit
thread "haskellers thoughts on statecharts", and I hope my short
excerpts won't warp the views of any of the authors:
/…/ the Haskell community in general is not very active in UI
development /…/
/…/ The downside of Statecharts is that all that flexibility
that they provide make modelchecking (and therefor also model-based
testing) harder. /…/
/…/ A state machine is a collection of states and transition
rules. But for implementation states are redundant, you can work with
transition rules alone, which are neatly represented as (a mutually
recursive family of) functions. /…/ But Haskell is free from this
restriction, so implementing state machine explicitly is usually
redundant.
Hense (sic!), as a Haskeller I don't really need explicit state
machine most of time. Functions as first-class citizens and TCO make
them an implementation detail at best and unneeded at worst. /…/
a. /…/ You're probably more lucky with search-terms like
"transition system", "actor model" or "state transducer" in the
Haskell ecosystem. /…/
b. /…/ Functional reactive programming (FRP) and arrows are ways of
implementing signal flow in Haskell. /…/
c. /…/ The State monad transformer can model that. If you have
external signals in an intermediate step, you can embed it in a
continuation monad. /…/
I allow myself to interpret, and comment this a bit:
I find the notion that H. programmers don't use statecharts since
they don't do that kind of programming a bit hard to believe. I may
be wrong, but I would think the field of application is much broader
than just UIs.
This argument is slightly above my head. The author perhaps focuses
on a certain type of testing, a certain tool. I mean, wouldn't a
well-defined set of states, up front, make testing a lot easier, if
anything?
This is perhaps the most interesting argument, that explicit states
would be superfluous in Haskell, and hence the need to code visible
state machines. My objection is perhaps unfair, but wasn't the state
machine abstraction (or rather, its visible incarnation in code)
invented in order to make things clearer and easier to understand?
The problem it solves — to make the unspecified, de facto state
machine hidden in "ordinary" programs, making decisions based on the
value of a large number of poorly organised variables, more visible
— isn't this negated by getting rid of the state machine?
Could it really be this simple, that statecharts are there, but are
simply called something completely different? … or are these
better solutions for the same problem, in the world of Haskell,
effectively making statecharts genuinely redundant?
Will all this become embarrassingly obvious once I have finished reading the book on Haskell and FP?

You could do it with state charts, but you don't need to because Haskell works at a higher level than other languages, and subsumes the state chart design into code. Here is how you do it. (Note: I'm simplifying this down from a monad transformer version which also handles exceptions, so sorry if I've made any mistakes)
First, you can define a state machine in Haskell like this:
newtype AutoS i o = AutoS {runAutoS :: (o, i -> AutoS i o)}
In other words a state machine consists of its latest output o and a function from an input i to the next state. I've called it "Auto" because these are automata, which is maths jargon for state machines.
Now you could just take a UML state chart and translate it directly into a collection of AutoS values. But there is a better way.
newtype Auto i o a = Auto {runAuto :: (a -> AutoS i o) -> AutoS i o}
This is a variation on the continuation monad. With a conventional monad the result from each step is used as the argument to the next, but in continuations each step gets the next step as a function argument and passes its result to that function. That sounds like a weird way of doing things, but it means that you get access to the "rest of the computation" from within the monad, which lets you do some clever things. But before explaining that, here are the instances. It's worth spending a bit of time meditating on the Monad instance in particular. In all these functions k is the continuation; the parameter representing the rest of the computation.
instance (Functor m) => Functor (Auto i o) where
fmap f (Auto act) = Auto $ \k -> act (k . f)
instance (Monad m) => Applicative (Auto i o ) where
pure v = Auto $ \k -> k v
f <*> v = Auto $ \k -> runAuto f $ \g -> runAuto v (k . g)
instance (Monad m) => Monad (Auto i o) where
return = pure
v >>= f = Auto $ \k -> runAuto v $ \x -> runAuto (f x) k
So now we can write actions in Auto. But what can they do? Here is the yield function, which is the only primitive in Auto:
yield :: o -> Auto i o i
yield v = Auto $ \k -> AutoS (v, k)
This is the magic bit. Like before, k is the continuation (i.e. everything that follows this step). Remember that the result of the step (i.e. the result of yield) gets passed to this function. By wrapping it up in an AutoS with the yielded value we are passing that value as the output of the state machine and giving the state machine caller the next state transition.
So now instead of writing a program as a state machine with lots of named states we can just write monadic code. When our monadic code wants to exchange information with the rest of the world it uses yield to send out a value and get a new one in return.
There is an old joke about a mathematician who is asked to capture a lion in a cage. The mathematician gets into the cage, closes the door, and declares (by a geometric inversion) that he is now outside the cage and everything else, lion included, is inside the cage. This continuation monad is a bit like that; your monadic code is like the mathematician; it passes a value into yield and gets a result back out. But from the point of view of the rest of the world the yielded value comes out of the state machine (the cage), and the next state transition goes back in to become the result of yield.
The only other thing you need here is a way to run an Auto action:
startMachine :: Auto i o Void -> AutoS i o
startMachine act = runAuto act $ error "Can't happen: Auto terminated."
This assumes that your state machines have no end state. You get Void from the void package. If you need an end state then it has to have a separate type and that type has to tramp around along with everything else. In the original Cont continuation monad that type is r.
The big advantage of doing this way is that the logic of the state machine is expressed as a sequence of steps just like a normal program. Without the monadic approach all you have is a list of states and transitions, which makes following the logic very hard. Its like reading a program where every line ends in a GOTO. When the code is incomprehensible you have to use a design notation to explain it. State charts are a bit like a flow chart for state machines. By using a monad you can program directly at the state chart level, rendering it irrelevant in exactly the same way that structured programming rendered flow charts irrelevant.
Creating a monad transformer version AutoT is left as an exercise for the student. Hint: newtype AutoS i o m = runAutoS :: m (o, i -> AutoS i o m)}, and then take a look at ContT in the monad transformer package.
Actually you could do this by replacing Auto with Cont. I didn't because I also needed exceptions and ContT doesn't do exceptions. Adding something like proper exception handling is also left as an exercise for the student. Hint: newtype AutoS i o e m = AutoS {runAutoS :: m (o, i -> AutoS i o e m, e -> AutoS i o e m) }
Edit: You mentioned the ease of testing the implementation of a state chart.
Yes, for a sufficiently small value of "test". If you implement a state chart in a conventional language using states indexed by an enumeration then testing that your state transitions match the state chart is indeed trivial. However you still need to verify that the state chart in your design was correct. This is exactly the same problem as verifying that monadic code in Auto is correct because they are both describing the solution at the same level of abstraction.

Related

Is there a real-world applicability for the continuation monad outside of academic use?

(later visitors: two answers to this question both give excellent insight, if you are interested you should probably read them both, I could only except one as a limitation of SO)
From all discussions I find online on the continuation monad they either mention how it could be used with some trivial examples, or they explain that it is a fundamental building block, as in this article on the Mother of all monads is the continuation monad.
I wonder if there is applicability outside of this range. I mean, does it make sense to wrap a recursive function, or mutual recursion in a continuation monad? Does it aid in readability?
Here's an F# version of the continuation mode taken from this SO post:
type ContinuationMonad() =
member this.Bind (m, f) = fun c -> m (fun a -> f a c)
member this.Return x = fun k -> k x
let cont = ContinuationMonad()
Is it merely of academic interest, for instance to help understand monads, or computation builders? Or is there some real-world applicability, added type-safety, or does it circumvent typical programming problems that are hard to solve otherwise?
I.e., the continuation monad with call/cc from Ryan Riley shows that it is complex to handle exceptions, but it doesn't explain what problem it is trying to solve and the examples don't show why it needs specifically this monad. Admittedly, I just don't understand what it does, but it may be a treasure trove!
(Note: I am not interested in understanding how the continuation monad works, I think I have a fair grasp of it, I just don't see what programming problem it solves.)

The "mother of all monads" stuff is not purely academic. Dan Piponi references Andrzej Filinski's Representing Monads, a rather good paper. The upshot of it is if your language has delimited continuations (or can mimic them with call/cc and a single piece of mutable state) then you can transparently add any monadic effect to any code. In other words, if you have delimited continuations and no other side effects, you can implement (global) mutable state or exceptions or backtracking non-determinism or cooperative concurrency. You can do each of these just by defining a few simply functions. No global transformation or anything needed. Also, you only pay for the side-effects when you use them. It turns out the Schemers were completely right about call/cc being highly expressive.
If your language doesn't have delimited continuations, you can get them via the continuation monad (or better the double-barrelled continuation monad). Of course, if you're going to write in monadic-style anyway – which is a global transformation – why not just use the desired monad from the get-go? For Haskellers, this is typically what we do, however, there are still benefits from using the continuation monad in many cases (albeit hidden away). A good example is the Maybe/Option monad which is like having exceptions except there's only one type of exception. Basically, this monad captures the pattern of returning an "error code" and checking it after each function call. And that's exactly what the typical definition does, except by "function call" I meant every (monadic) step of the computation. Suffice to say, this is pretty inefficient, especially when the vast majority of the time there is no error. If you reflect Maybe into the continuation monad though, while you have to pay the cost of the CPSed code (which GHC Haskell handles suprisingly well), you only pay to check the "error code" in places where it matters, i.e. catch statements. In Haskell, the Codensity monad than danidiaz mentioned is a better choice because the last thing Haskellers want is to make it so that arbitrary effects can be transparently interleaved in their code.
As danidiaz also mentioned, many monads are more easily or more efficiently implemented using essentially a continuation monad or some variant. Backtracking search is one example. While not the newest thing on the backtracking, one of my favorite papers that used it was Typed Logical Variables in Haskell. The techniques used in it was also used in the Wired Hardware Description Language. Also from Koen Claesson is A Poor Man's Concurrency Monad. More modern uses of the ideas in this example include: the monad for deterministic parallelism in Haskell A Monad for Deterministic Parallelism and scalable I/O managers Combining Events And Threads For Scalable Network Services. I'm sure I can find similar techniques used in Scala. If it wasn't provided, you could use a continuation monad to implement asynchronous workflows in F#. In fact, Don Syme references exactly the same papers I just referenced. If you can serialize functions but don't have continuations, you can use a continuation monad to get them and do the serialized continuation type of web programming made popular by systems like Seaside. Even without serializable continuations, you can use the pattern (essentially the same as async) to at least avoid callbacks while storing the continuations locally and only sending a key.
Ultimately, relatively few people outside of Haskellers are using monads in any capacity, and as I alluded to earlier, Haskellers tend to want to use more contcrollable monads than the continuation monad, though they do use them internally quite a bit. Nevertheless, continuation monads or continuation monad like things, particularly for asynchronous programming, are becoming less uncommon. As C#, F#, Scala, Swift, and even Java start incorporating support monadic or at least monadic-style programming, these ideas will become more broadly used. If the Node developers were more conversant with this, maybe they would have realized you could have your cake and eat it too with regards to event-driven programming.

To provide a more direct F#-specific answer (even though Derek already covered that too), the continuation monad pretty much captures the core of how asynchronous workflows work.
A continuation monad is a function that, when given a continuation, eventually calls the continuation with the result (it may never call it or it may call it repeatedly too):
type Cont<'T> = ('T -> unit) -> unit
F# asynchronous computations are a bit more complex - they take continuation (in case of success), exception and cancellation continuations and also include the cancellation token. Using a slightly simplified definition, F# core library uses (see the full definition here):
type AsyncParams =
{ token : CancellationToken
econt : exn -> unit
ccont : exn -> unit }
type Async<'T> = ('T -> unit) * AsyncParams -> unit
As you can see, if you ignore AsyncParams, it is pretty much the continuation monad. In F#, I think the "classical" monads are more useful as an inspiration than as a direct implementation mechanism. Here, the continuation monad provides a useful model of how to handle certain kinds of computations - and with many additional async-specific aspects, the core idea can be used to implement asynchronous computations.
I think this is quite different to how monads are used in classic academic works or in Haskell, where they tend to be used "as is" and perhaps composed in various ways to construct more complex monads that capture more complex behaviour.
This may be just my personal opinion, but I'd say that the continuation monad is not practically useful in itself, but it is a basis for some very practical ideas. (Just like lambda calculus is not really practically useful in itself, but it can be seen as an inspiration for nice practical languages!)

I certainly find it easier to read a recursive function implemented using the continuation monad compared to one implemented using explicit recursion. For example, given this tree type:
type 'a Tree =
| Node of 'a * 'a Tree * 'a Tree
| Empty
here's one way to write a bottom-up fold over a tree:
let rec fold e f t = cont {
match t with
| Node(a,t1,t2) ->
let! r1 = fold e f t1
let! r2 = fold e f t2
return f a r1 r2
| Empty -> return e
}
This is clearly analogous to a naïve fold:
let rec fold e f t =
match t with
| Node(a,t1,t2) ->
let r1 = fold e f t1
let r2 = fold e f t2
f a r1 r2
| Empty -> return e
except that the naïve fold will blow the stack when called on a deep tree because it's not tail recursive, while the fold written using the continuation monad won't. You can of course write the same thing using explicit continuations, but to my eye the amount of clutter that they add distracts from the structure of the algorithm (and putting them in place is not completely fool-proof):
let rec fold e f t k =
match t with
| Node(a,t1,t2) ->
fold e f t1 (fun r1 ->
fold e f t2 (fun r2 ->
k (f r1 r2)))
| Empty -> k e
Note that in order for this to work, you'll need to modify your definition of ContinuationMonad to include
member this.Delay f v = f () v

Monads in Haskell and Purity

My question is whether monads in Haskell actually maintain Haskell's purity, and if so how. Frequently I have read about how side effects are impure but that side effects are needed for useful programs (e.g. I/O). In the next sentence it is stated that Haskell's solution to this is monads. Then monads are explained to some degree or another, but not really how they solve the side-effect problem.
I have seen this and this, and my interpretation of the answers is actually one that came to me in my own readings -- the "actions" of the IO monad are not the I/O themselves but objects that, when executed, perform I/O. But it occurs to me that one could make the same argument for any code or perhaps any compiled executable. Couldn't you say that a C++ program only produces side effects when the compiled code is executed? That all of C++ is inside the IO monad and so C++ is pure? I doubt this is true, but I honestly don't know in what way it is not. In fact, didn't Moggi (sp?) initially use monads to model the denotational semantics of imperative programs?
Some background: I am a fan of Haskell and functional programming and I hope to learn more about both as my studies continue. I understand the benefits of referential transparency, for example. The motivation for this question is that I am a grad student and I will be giving 2 1-hour presentations to a programming languages class, one covering Haskell in particular and the other covering functional programming in general. I suspect that the majority of the class is not familiar with functional programming, maybe having seen a bit of scheme. I hope to be able to (reasonably) clearly explain how monads solve the purity problem without going into category theory and the theoretical underpinnings of monads, which I wouldn't have time to cover and anyway I don't fully understand myself -- certainly not well enough to present.
I wonder if "purity" in this context is not really well-defined?

It's hard to argue conclusively in either direction because "pure" is not particularly well-defined. Certainly, something makes Haskell fundamentally different from other languages, and it's deeply related to managing side-effects and the IO type¹, but it's not clear exactly what that something is. Given a concrete definition to refer to we could just check if it applies, but this isn't easy: such definitions will tend to either not match everyone's expectations or be too broad to be useful.
So what makes Haskell special, then? In my view, it's the separation between evaluation and execution.
The base language—closely related to the λ-caluclus—is all about the former. You work with expressions that evaluate to other expressions, 1 + 1 to 2. No side-effects here, not because they were suppressed or removed but simply because they don't make sense in the first place. They're not part of the model² any more than, say, backtracking search is part of the model of Java (as opposed to Prolog).
If we just stuck to this base language with no added facilities for IO, I think it would be fairly uncontroversial to call it "pure". It would still be useful as, perhaps, a replacement for Mathematica. You would write your program as an expression and then get the result of evaluating the expression at the REPL. Nothing more than a fancy calculator, and nobody accuses the expression language you use in a calculator of being impure³!
But, of course, this is too limiting. We want to use our language to read files and serve web pages and draw pictures and control robots and interact with the user. So the question, then, is how to preserve everything we like about evaluating expressions while extending our language to do everything we want.
The answer we've come up with? IO. A special type of expression that our calculator-like language can evaluate which corresponds to doing some effectful actions. Crucially, evaluation still works just as before, even for things in IO. The effects get executed in the order specified by the resulting IO value, not based on how it was evaluated. IO is what we use to introduce and manage effects into our otherwise-pure expression language.
I think that's enough to make describing Haskell as "pure" meaningful.
footnotes
¹ Note how I said IO and not monads in general: the concept of a monad is immensely useful for dozens of things unrelated to input and output, and the IO types has to be more than just a monad to be useful. I feel the two are linked too closely in common discourse.
² This is why unsafePerformIO is so, well, unsafe: it breaks the core abstraction of the language. This is the same as, say, putzing with specific registers in C: it can both cause weird behavior and stop your code from being portable because it goes below C's level of abstraction.
³ Well, mostly, as long as we ignore things like generating random numbers.

A function with type, for example, a -> IO b always returns an identical IO action when given the same input; it is pure in that it cannot possibly inspect the environment, and obeys all the usual rules for pure functions. This means that, among other things, the compiler can apply all of its usual optimization rules to functions with an IO in their type, because it knows they are still pure functions.
Now, the IO action returned may, when run, look at the environment, read files, modify global state, whatever, all bets are off once you run an action. But you don't necessarily have to run an action; you can put five of them into a list and then run them in reverse of the order in which you created them, or never run some of them at all, if you want; you couldn't do this if IO actions implicitly ran themselves when you created them.
Consider this silly program:
main :: IO ()
main = do
inputs <- take 5 . lines <$> getContents
let [line1,line2,line3,line4,line5] = map print inputs
line3
line1
line2
line5
If you run this, and then enter 5 lines, you will see them printed back to you but in a different order, and with one omitted, even though our haskell program runs map print over them in the order they were received. You couldn't do this with C's printf, because it immediately performs its IO when called; haskell's version just returns an IO action, which you can still manipulate as a first-class value and do whatever you want with.

I see two main differences here:
1) In haskell, you can do things that are not in the IO monad. Why is this good? Because if you have a function definitelyDoesntLaunchNukes :: Int -> IO Int you don't know that the resulting IO action doesn't launch nukes, it might for all you know. cantLaunchNukes :: Int -> Int will definitely not launch any nukes (barring any ugly hacks that you should avoid in nearly all circumstances).
2) In haskell, it's not just a cute analogy: IO actions are first class values. You can put them in lists, and leave them there for as long as you want, they won't do anything unless they somehow become part of the main action. The closest that C has to that are function pointers, which are quite a bit more cumbersome to use. In C++ (and most modern imperative languages really) you have closures which technically could be used for this purpose, but rarely are - mainly because Haskell is pure and they aren't.
Why does that distinction matter here? Well, where are you going to get your other IO actions/closures from? Probably, functions/methods of some description. Which, in an impure language, can themselves have side effects, rendering the attempt of isolating them in these languages pointless.

fiction-mode: Active
It was quite a challenge, and I think a wormhole could be forming in the neighbour's backyard, but I managed to grab part of a Haskell I/O implementation from an alternate reality:
class Kleisli k where
infixr 1 >=>
simple :: (a -> b) -> (a -> k b)
(>=>) :: (a -> k b) -> (b -> k c) -> a -> k c
instance Kleisli IO where
simple = primSimpleIO
(>=>) = primPipeIO
primitive primSimpleIO :: (a -> b) -> (a -> IO b)
primitive primPipeIO :: (a -> IO b) -> (b -> IO c) -> a -> IO c
Back in our slightly-mutilated reality (sorry!), I have used this other form of Haskell I/O to define our form of Haskell I/O:
instance Monad IO where
return x = simple (const x) ()
m >>= k = (const m >=> k) ()
and it works!
fiction-mode: Offline
My question is whether monads in Haskell actually maintain Haskell's purity, and if so how.
The monadic interface, by itself, doesn't maintain restrain the effects - it is only an interface, albeit a jolly-versatile one. As my little work of fiction shows, there are other possible interfaces for the job - it's just a matter of how convenient they are to use in practice.
For an implementation of Haskell I/O, what keeps the effects under control is that all the pertinent entities, be they:
IO, simple, (>=>) etc
or:
IO, return, (>>=) etc
are abstract - how the implementation defines those is kept private.
Otherwise, you would be able to devise "novelties" like this:
what_the_heck =
do spare_world <- getWorld -- how easy was that?
launchMissiles -- let's mess everything up,
putWorld spare_world -- and bring it all back :-D
what_the_heck -- that was fun; let's do it again!
(Aren't you glad our reality isn't quite so pliable? ;-)
This observation extends to types like ST (encapsulated state) and STM (concurrency) and their stewards (runST, atomically etc). For types like lists, Maybe and Either, their orthodox definitions in Haskell means no visible effects.
So when you see an interface - monadic, applicative, etc - for certain abstract types, any effects (if they exist) are contained by keeping its implementation private; safe from being used in aberrant ways.

One more time...can I have an example of state monad that does what I want?

I'm trying to understand the actual need for the reader and/or state monads. All the examples I've seen (including many on stackoverflow as I've hunted for suitable examples I can use and in various books and blog articles) are of the form (pseudo code)
f = do
foo <- ask
do something with foo
g = do
foo <- ask
do something else using foo
h = runReader
(
f
g
)
In other words, call two functions and (presumably) maintain some state from one call to the next. However, I don't find this example particularly convincing as (I think) I could just have f return some state and then pass that state on to g.
I'd love to see an example, using a single integer (say) as the state to be preserved where, rather than two sequential calls to f and then g from a central place, rather there's a call to f which then internally calls g and then have changed state (if state monad) available back in the main routine.
Most (well actually all) the examples I have seen spend a tremendous amount of time focusing on the definition of the monad and then show how to set up a single function call. To me, it would be the ability to do nested calls and have the state carried along for the ride to demonstrate why it's useful.

Here's a non-trivial example of a stateful subroutine calling another stateful subroutine.
import Control.Monad.Trans.State
f :: State Int ()
f = do
r <- g
modify (+ r)
g :: State Int Int
g = do
modify (+ 1)
get
main = print (execState f 4)
In this example, the initial state begins at 4 and the stateful computation begins at f. f internally calls g, which increments the state to 5 and then returns the current state (still 5). This restores control to f, which binds the value 5 to r and then increments the current state by r, giving a final state of 10:
>>> main
10

Almost everything you can do with monads you can do without them. (Well, some are special like ST, STM, IO, etc., but that's a different story.) But:
they allow you to encapsulate many common patterns, like in this case stateful computations, and hide details or boiler-plate code that would be otherwise needed; and
there are plethora of functions that work on any (or many) monads, which you can just specialize for a particular monad you're using.
To give an example: Often one needs to have some kind of a generator that supplies unique names, like when generating code etc. This can be easily accomplished using the state monad: Each time newName is called, it outputs a new name and increments the internal state:
import Control.Monad.State
import Data.Tree
import qualified Data.Traversable as T
type NameGen = State Int
newName :: NameGen String
newName = state $ \i -> ("x" ++ show i, i + 1)
Now let's say we have a tree that has some missing values. We'd like to supply them with such generated names. Fortunately, there is a generic function mapM that allows to traverse any traversable structure with any monad (without the monad abstraction, we wouldn't have this function). Now fixing the tree is easy. For each value we check if it's filled (then we just use return to lift it into the monad), and if not, supply a new name:
fillTree :: Tree (Maybe String) -> NameGen (Tree String)
fillTree = T.mapM (maybe newName return)
Just imagine implementing this function without monads, with explicit state - going manually through the tree and carrying the state around. The original idea would be completely lost in boilerplate code. Moreover, the function would be very specific to Tree and NameGen.
But with monads, we can go even further. We could parametrize the name generator and construct even more generic function:
fillTreeM :: (Monad m) => m String -> Tree (Maybe String) -> m (Tree String)
fillTreeM gen = T.mapM (maybe gen return)
Note the first parameter m String. It's not a constant String value, it's a recipe for generating a new String within m, whenever it's needed.
Then the original one can be rewritten just as
fillTree' :: Tree (Maybe String) -> NameGen (Tree String)
fillTree' = fillTreeM newName
But now we can use the same function for many very different purposes. For example, use the Rand monad and supply randomly generated names.
Or, at some point we might consider a tree without filled out nodes invalid. Then we just say that wherever we're asked for a new name, we instead abort the whole computation. This can be implemented just as
checkTree :: Tree (Maybe String) -> Maybe (Tree String)
checkTree = fillTreeM Nothing
where Nothing here is of type Maybe String, which, instead of trying to generate a new name, aborts the computation within the Maybe monad.
This level of abstraction would be hardly possible without having the concept of monads.

I'm trying to understand the actual need for the reader and/or state monads.
There are many ways to understand monads in general, and these monads in particular. In this answer, I focus on one understanding of these monads which I believe might be helpful for the OP.
The reader and state monads provide library support for very simple usage patterns:
The reader monad provides support for passing arguments to functions.
The state monad provides support for getting results out of functions and passing them to other functions.
As the OP correctly figured out, there is no great need for library support for these things, which are already very easy in Haskell. So many Haskell programs could use a reader or state monad, but there's no point in doing so, so they don't.
So why would anyone ever use a reader or state monad? I know three important reasons:
Realistic programs contain many functions that call each other and pass information back and forth. Sometimes, many functions take arguments that are just passed on to other functions. The reader monad is a library for this "accept arguments and pass them on" pattern. The state monad is a library for the similar "accept arguments, pass them on, and pass the results back as my result" pattern.
In this situation, a benefit of using the reader or state monad is that the arguments get passed on automatically, and we can focus on the more interesting jobs of these functions. A cost is that we have to use monadic style (do notation etc.).
Realistic programs can use multiple monads at once. They need arguments that are passed on, arguments that are returned, error handling, nondeterminism, ...
In this situation, a benefit of using the reader or state monad transformer is that we can package all of these monads into a single monad transformer stack. We still need monadic style, but now we pay the cost once (use do notation everywhere) and get the benefit often (multiple monads in the transformer stack).
Some library functions work for arbitrary monads. For example, the sequence :: [m a] -> m [a] takes a list of monadic actions, runs all of them in sequence, and returns the collected result.
A benefit of using the reader or state (or whatever) monad is that we can use these very generic library functions that work for any monad.
Note that points 1 and 2 only show up in realistic, somewhat large programs. So it is hard to give a small example for this benefit of using monads. Point 3 shows up in small library functions, but is harder to understand, because these library functions are often very abstract.

ST Monad == code smell?

I'm working on implementing the UCT algorithm in Haskell, which requires a fair amount of data juggling. Without getting into too much detail, it's a simulation algorithm where, at each "step," a leaf node in the search tree is selected based on some statistical properties, a new child node is constructed at that leaf, and the stats corresponding to the new leaf and all of its ancestors are updated.
Given all that juggling, I'm not really sharp enough to figure out how to make the whole search tree a nice immutable data structure à la Okasaki. Instead, I've been playing around with the ST monad a bit, creating structures composed of mutable STRefs. A contrived example (unrelated to UCT):
import Control.Monad
import Control.Monad.ST
import Data.STRef
data STRefPair s a b = STRefPair { left :: STRef s a, right :: STRef s b }
mkStRefPair :: a -> b -> ST s (STRefPair s a b)
mkStRefPair a b = do
a' <- newSTRef a
b' <- newSTRef b
return $ STRefPair a' b'
derp :: (Num a, Num b) => STRefPair s a b -> ST s ()
derp p = do
modifySTRef (left p) (\x -> x + 1)
modifySTRef (right p) (\x -> x - 1)
herp :: (Num a, Num b) => (a, b)
herp = runST $ do
p <- mkStRefPair 0 0
replicateM_ 10 $ derp p
a <- readSTRef $ left p
b <- readSTRef $ right p
return (a, b)
main = print herp -- should print (10, -10)
Obviously this particular example would be much easier to write without using ST, but hopefully it's clear where I'm going with this... if I were to apply this sort of style to my UCT use case, is that wrong-headed?
Somebody asked a similar question here a couple years back, but I think my question is a bit different... I have no problem using monads to encapsulate mutable state when appropriate, but it's that "when appropriate" clause that gets me. I'm worried that I'm reverting to an object-oriented mindset prematurely, where I have a bunch of objects with getters and setters. Not exactly idiomatic Haskell...
On the other hand, if it is a reasonable coding style for some set of problems, I guess my question becomes: are there any well-known ways to keep this kind of code readable and maintainable? I'm sort of grossed out by all the explicit reads and writes, and especially grossed out by having to translate from my STRef-based structures inside the ST monad to isomorphic but immutable structures outside.

I don't use ST much, but sometimes it is just the best solution. This can be in many scenarios:
There are already well-known, efficient ways to solve a problem. Quicksort is a perfect example of this. It is known for its speed and in-place behavior, which cannot be imitated by pure code very well.
You need rigid time and space bounds. Especially with lazy evaluation (and Haskell doesn't even specify whether there is lazy evaluation, just that it is non-strict), the behavior of your programs can be very unpredictable. Whether there is a memory leak could depend on whether a certain optimization is enabled. This is very different from imperative code, which has a fixed set of variables (usually) and defined evaluation order.
You've got a deadline. Although the pure style is almost always better practice and cleaner code, if you are used to writing imperatively and need the code soon, starting imperative and moving to functional later is a perfectly reasonable choice.
When I do use ST (and other monads), I try to follow these general guidelines:
Use Applicative style often. This makes the code easier to read and, if you do switch to an immutable version, much easier to convert. Not only that, but Applicative style is much more compact.
Don't just use ST. If you program only in ST, the result will be no better than a huge C program, possibly worse because of the explicit reads and writes. Instead, intersperse pure Haskell code where it applies. I often find myself using things like STRef s (Map k [v]). The map itself is being mutated, but much of the heavy lifting is done purely.
Don't remake libraries if you don't have to. A lot of code written for IO can be cleanly, and fairly mechanically, converted to ST. Replacing all the IORefs with STRefs and IOs with STs in Data.HashTable was much easier than writing a hand-coded hash table implementation would have been, and probably faster too.
One last note - if you are having trouble with the explicit reads and writes, there are ways around it.

Algorithms which make use of mutation and algorithms which do not are different algorithms. Sometimes there is a strightforward bounds-preserving translation from the former to the latter, sometimes a difficult one, and sometimes only one which does not preserve complexity bounds.
A skim of the paper reveals to me that I don't think it makes essential use of mutation -- and so I think a potentially really nifty lazy functional algorithm could be developed. But it would be a different but related algorithm to that described.
Below, I describe one such approach -- not necessarily the best or most clever, but pretty straightforward:
Here's the setup a I understand it -- A) a branching tree is constructed B) payoffs are then pushed back from the leafs to the root which then indicates the best choice at any given step. But this is expensive, so instead, only portions of the tree are explored to the leafs in a nondeterministic manner. Furthermore, each further exploration of the tree is determined by what's been learned in previous explorations.
So we build code to describe the "stage-wise" tree. Then, we have another data structure to define a partially explored tree along with partial reward estimates. We then have a function of randseed -> ptree -> ptree that given a random seed and a partially explored tree, embarks on one further exploration of the tree, updating the ptree structure as we go. Then, we can just iterate this function over an empty seed ptree to get a list of increasingly more sampled spaces in the ptree. We then can walk this list until some specified cutoff condition is met.
So now we've gone from one algorithm where everything is blended together to three distinct steps -- 1) building the whole state tree, lazily, 2) updating some partial exploration with some sampling of a structure and 3) deciding when we've gathered enough samples.

It's can be really difficult to tell when using ST is appropriate. I would suggest you do it with ST and without ST (not necessarily in that order). Keep the non-ST version simple; using ST should be seen as an optimization, and you don't want to do that until you know you need it.

I have to admit that I cannot read the Haskell code. But if you use ST for mutating the tree, then you can probably replace this with an immutable tree without losing much because:
Same complexity for mutable and immutable tree
You have to mutate every node above the new leaf. An immutable tree has to replace all nodes above the modified node. So in both cases the touched nodes are the same, thus you don't gain anything in complexity.
For e.g. Java object creation is more expensive than mutation, so maybe you can gain a bit here in Haskell by using mutation. But this I don't know for sure. But a small gain does not buy you much because of the next point.
Updating the tree is presumably not the bottleneck
The evaluation of the new leaf will probably be much more expensive than updating the tree. At least this is the case for UCT in computer Go.

Use of the ST monad is usually (but not always) as an optimization. For any optimization, I apply the same procedure:
Write the code without it,
Profile and identify bottlenecks,
Incrementally rewrite the bottlenecks and test for improvements/regressions,
The other use case I know of is as an alternative to the state monad. The key difference being that with the state monad the type of all of the data stored is specified in a top-down way, whereas with the ST monad it is specified bottom-up. There are cases where this is useful.

Creative uses of monads

I'm looking for creative uses of monads to learn from. I've read somewhere that monads have been used for example in AI, but being a monad newbie, I fail to see how.
Please include a link to the source code and sample usages. No standard monads please.

Phil Wadler has written many papers on monads, but the one to read first is a lot of fun and will be accessible to any programmer; it's called The essence of functional programming. The paper includes source code and sample usages.
A personal favorite of mine is the probability monad; if you can find Sungwoo Park's PhD thesis, it has a number of interesting example codes from robotics.

There's also LogicT (backtracking monad transformer with fair operations and pruning).
It has good value to AI Search algorithms because of its constructs for fair disjunctions, for example, easily enabling computations that succeed an infinite number of times to be combined (interleaved).
It's usage is described in the ICFP'05 paper Backtracking, Interleaving, and Terminating Monad Transformers

you can find interesting and advanced monads in the blog A Neighborhood of Infinity. I can note the Vector Space Monad, and its use for rational tangles description. Unfortunately,I don't think I understand this well enough to explain it here.

One of my favorite monads is Martin Escardo's search monad. It can be found on hackage in infinite-search package.
It is the monad of "search functions" for a set of elements of type a, namely (a -> Bool) -> Maybe a (finding an element in the set matching a given predicate).

One interesting use of monad is in parsing. Parsec is the standard example.

Read series of articles on monads used to model probability and probabilistic processes here : http://www.randomhacks.net/articles/2007/03/03/smart-classification-with-haskell (follow links to prev/next parts)

Harpy, a package for run-time generation of x86 machine code, uses a code generation monad. From the description:
This is a combined reader-state-exception monad which handles all the details of handling code buffers, emitting binary data, relocation etc.
All the code generation functions in module Harpy.X86CodeGen live in this monad and use its error reporting facilities as well as the internal state maintained by the monad.
The library user can pass a user environment and user state through the monad. This state is independent from the internal state and may be used by higher-level code generation libraries to maintain their own state across code generation operations.
I found this a particularly interesting example because I think that this pattern is not uncommon: I'd invented something quite similar myself for generating a set of internal messages for my application based on messages received from a (stock) market data feed. It turns out to be an extremely comfortable way to have a framework keep track of various "global" things whilst composing simple operations that in and of themselves keep no state.
I took one step further his idea of having a user state (which I call a "substate") that could also be passed through the monad: I have a mechanism for switching out and restoring state during the monad run:
-- | Given a generator that uses different substate type, convert it
-- to a generator that runs with our substate type. As well as the
-- other-substate-type generator, the caller must provide an initial
-- substate for that generator and a function taking the final substate
-- of the generator and producing a new substate of our type. This
-- preserves all other (non-substate) parts of the master state touched
-- by the generator.
--
mgConvertSubstate :: MsgGen msg st' a -> st' -> (st' -> st) -> MsgGen msg st a
This is used for subgroups of combinators that had their own state needed for a short period. These run with just their state, not knowing anything about the state of the generator that invoked it (which helps make things more modular), and yet this preserves any non-user-specific state, such as the current list of messages generated and the current set of warnings or errors, as well as the control flow (i.e., allowing total aborts to flow upwards).

I'd like to list a couple of monads not yet mentioned in other answers.
Enumerate and weighted search monads
The Omega monad can be used to productively traverse infinite lists of results. Compare:
>>> take 10 $ liftM2 (,) [0..] [0..]
[(0,0),(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8),(0,9)]
>>> take 10 $ runOmega $ liftM2 (,) (each' [0..]) (each' [0..])
[(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)]
With a bit more advanced WeightedSearch monad it is also possible to assign weights to computations so that results of computations with lower weights would appear first in the output.
Accumulating errors monad
A useful These data type forms a Monad similar to Either, but able to accumulate errors rather. The package also defines MonadChronicle class as well as ChronicleT monad transformer based on These.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string